Data Provider or Properties?

In a previous article, I introduced property-based testing using a simple example: reversing arrays with array_reverse(). We formulated several properties for this operation:

The reversed array has the same size as the original array
The reversed array has the same elements as the original array, just in a different order
The first element of the original array becomes the last element of the reversed array
If we apply array_reverse() twice, we get the original array

Some readers then asked me: Is this a replacement for PHPUnit's data providers? Should we only use properties in future? Or do data providers still make sense?

These questions are justified because, at first glance, both methods appear similar. In both cases, a test method is executed with several different inputs. In fact, however, they solve different problems and offer different guarantees. There is also a technical difference that is often overlooked: how PHPUnit executes these tests and how this affects test isolation.

One test method, multiple inputs

First, the obvious: with both the data provider approach and property-based testing, we can test the same logic with many different inputs.

A typical PHPUnit test with data providers might look like this:

<?php declare(strict_types=1);
use PHPUnit\Framework\Attributes\DataProvider;
use PHPUnit\Framework\TestCase;

final class ArrayReverseTest extends TestCase
{
    public static function provider(): array
    {
        return [
            'empty array' => [
                [],
            ],

            'single element' => [
                [1],
            ],

            'multiple unique elements' => [
                [1, 2, 3],
            ],

            'multiple non-unique elements' => [
                [1, 2, 2, 3],
            ],
        ];
    }

    #[DataProvider('provider')]
    public function testReversingTwiceIsIdentity(array $input): void
    {
        $this->assertSame($input, array_reverse(array_reverse($input)));
    }
}

We explicitly define some typical and interesting cases. Each of these data sets, which are returned by the data provider method provider(), are executed by PHPUnit as a separate test case.

We can formulate the same behaviour as a property:

<?php declare(strict_types=1);
use function Eris\Generator\int;
use function Eris\Generator\seq;
use Eris\TestTrait;
use PHPUnit\Framework\TestCase;

final class ArrayReverseTest extends TestCase
{
    use TestTrait;

    public function testReversingTwiceIsIdentity(): void
    {
        $this
            ->forAll(seq(int()))
            ->then(
                function (array $input): void
                {
                    $reversed       = array_reverse($input);
                    $doubleReversed = array_reverse($reversed);

                    $this->assertSame($input, $doubleReversed);
                },
            );
    }
}

Here we are not describing specific inputs, but rather a general property: "Reversing twice yields the original array". Eris automatically generates many arrays ([], [1], [2, 5, 8], long arrays, arrays with negative numbers, etc.) and tests the property for each of these inputs.

At first glance, in both cases array_reverse() is tested with different inputs. But conceptually and technically, something very different is happening.

Examples versus invariants

The first fundamental difference is the type of specification.

In the data provider approach, we use concrete examples:

Each data set provided by the data provider is an explicitly chosen test case
For each input, we specify the exact expected output
The set of inputs tested is finite: exactly the cases we list

For our simple example with array_reverse(), this means:

We test the empty array
We test an array with one element
We test an array with a few elements
We may test a few special cases (duplicates, etc.)

With property-based testing, we shift the focus and describe general truths (invariants) about array_reverse(). We do not say "this specific input must lead to this specific result", but rather "the following condition must apply to all inputs".

Data providers therefore answer: "What should happen with this specific input?" Properties answer: "What general statements must apply to all inputs?"

Selection versus generation

The second difference concerns the question: Who selects the test data?

With the data provider approach, we decide which arrays we test array_reverse() with:

We consider: Which examples are meaningful?
We deliberately add edge cases: empty arrays, arrays with one element, possibly very long arrays (if we think of them)
We use these examples to document how array_reverse() should behave in typical situations

The disadvantage is that we only test what we can think of. For example, if we never think of a very large array or arrays with unusual keys, then these cases remain untested.

In property-based testing, we "only" describe the input space: forAll(seq(int())) automatically generates many different arrays:

Very short and very long arrays
Arrays with negative numbers, zero, large numbers
Arrays that we would probably never have thought of ourselves

If a library for property-based testing such as Eris finds a counterexample to our property, it uses shrinking to reduce the input to the smallest, simplest example that still triggers the error.

In short: data providers document your current state of knowledge. Properties plus generators help to uncover the gaps in that knowledge.

Deterministic versus exploratory

The third difference concerns repeatability and goal setting.

Tests that use data providers are deterministic:

Each test run executes the tested code with exactly the same inputs
They are perfect for regression testing: a bug that is found is recorded as an example, and each execution of the test suite checks that this bug has not been reintroduced (in exactly the same way)
They serve as living documentation: "We expect exactly this result for this real business case"

Property-based tests are exploratory:

They search the input space with a random component
Their purpose is not documentation, but discovery: find errors that I would never have found with handwritten examples
We can make them deterministic with a fixed random seed if we want to reproduce a found error, but that is the exception

Data providers freeze known scenarios. Property-based tests explore unknown scenarios.

The critical difference

Up to this point, the comparison has been rather conceptual. Now it gets technical. And here comes an often overlooked but important difference: It is about how PHPUnit executes these tests and what impact this has on test isolation.

Let us assume that our data provider method returns 4 data sets, as in the example above. PHPUnit counts this as four separate tests. The following happens for each data set:

PHPUnit creates a new instance of the test class ArrayReverseTest
Before-test methods such as setUp() are executed
The test method testReversingTwiceIsIdentity() is executed with one data set
After-test methods such as tearDown() are executed
The instance is discarded

This has two consequences:

Each data set is used for a test in a fresh environment
Any state in your test class is rebuilt for each data set and then cleared again

So if you set up a large fixture in a before-test method, for example, you can be sure that every test for every data record will see this state as "fresh".

Let us assume that Eris is configured to generate 100 different arrays. PHPUnit counts this as 1 test, not 100. However, PHPUnit does count the assertions correctly, which in our example is 100. The execution looks like this:

PHPUnit creates a new instance of the test class ArrayReverseTest
Before-test methods such as setUp() are executed exactly once
The test method testReversingTwiceIsIdentity() is executed exactly once
After-test methods such as tearDown() are executed exactly once
The instance is discarded

This means that PHPUnit does not call before-test methods such as setUp() or after-test methods such as tearDown() between the 100 iterations. For PHPUnit, this is a single test execution.

Isolation between the individual iterations in the then() callable is not automatically guaranteed.

Best Practices

To work cleanly with a property-based testing library such as Eris, it helps to follow a few basic rules:

Prefer stateless code
Fortunately, this is easy for functions such as array_reverse(): We have a function that takes an array and returns a new array. No global state, no side effects.
Create state locally in the then() callable
If we need additional data structures in individual iterations, we create them in the callable itself. This gives each iteration its own local state.
If unavoidable: clean up explicitly
If we need global state, such as a cache or a temporary file, we take care of cleaning up at the end of the callable so that the next iteration can start clean again.

Conclusion

Even in a simple example such as testing the array_reverse() function, the roles of data provider use and property-based testing can be clearly distinguished.

Data providers document specific, important cases (including bugs found for regression testing), benefit from complete test isolation through PHPUnit, and are ideal for explaining behaviour to beginners and non-specialists.

In property-based testing, properties capture general invariants and automatically generate many inputs, including those we would not have thought of. We have to write the tests with isolation in mind because PHPUnit executes all iterations in a single test context.

A good testing strategy uses both: We formulate general properties to systematically explore the input space using property-based testing. If our property-based tests find an interesting counterexample, we transform it into a "normal" PHPUnit test as an explicit example and documented regression test. We use data providers specifically where PHPUnit's automatic test isolation helps us or where concrete examples explain the behaviour better than abstract properties.

I have over 35 years' experience developing software, including almost 30 years working with PHP. I have also been developing PHPUnit for over 25 years. The knowledge I have gained during this time is reflected in my articles, but this is just the tip of the iceberg.

If you and your team want to achieve measurable progress, I would be happy to support you with targeted advice and individual coaching. Let's get talking!

One test method, multiple inputs

Examples versus invariants

Selection versus generation

Deterministic versus exploratory

The critical difference

Best Practices

Conclusion

More articles

Smaller input, greater insight

Property-Based Testing

Seeing the Truth: Test Oracles

Stay up to date