In modern PHP development, the wide range of development and analysis tools available can sometimes be both a blessing and a curse. For many tasks, there are several established tools, each with its own strengths, weaknesses, and philosophies.
This article is part of a series in which I examine two of these tools in direct comparison. I show how their concepts differ, for which purposes they are particularly well suited, and what teams should pay attention to if they want to build robust, maintainable, and efficient development environments.
I have already written about PCOV and Xdebug, PHP_CodeSniffer and PHP-CS-Fixer, as well as Psalm and PHPStan. In this article, I would like to explore the question of whether path coverage or mutation testing is the better solution for assessing the quality of a test suite.
Code Coverage
Code Coverage is a fundamental software metric in software testing and measures the proportion of source code that is actually executed during test execution, i.e. covered by tests. It is expressed as a percentage and, in its simplest form, known as Line Coverage, relates the number of lines of code executed by tests to the number of executable lines of code. A line is considered executed if at least one statement in that line has been executed at least once during test execution.
Branch Coverage goes one step further and measures whether all possible outputs from decision points (if statements, loops, etc.) have been tested. Each branch must have gone through both the true and false cases. 100% branch coverage guarantees 100% line coverage, but not vice versa.
Path Coverage is the most comprehensive form of code coverage and considers all possible execution paths through the code. Such a path is a unique sequence of statements from the starting point to the end point of a code unit, for example a function or method. 100% path coverage guarantees 100% branch coverage, but not vice versa.
Code Coverage is a quantitative measure, not a qualitative one. It answers the question "How much?", not "How well?". That is why it is important to understand code coverage for what it is: an indicator of executed code, not of the quality of the tests.
In combination with Xdebug, PHPUnit can collect path coverage data and output it in HTML and XML reports. You can see this in action in this presentation:
Mutation Testing
Mutation Testing is a technique that can be used to evaluate the quality of existing tests. It can also help in the development of new tests. To do this, controlled changes are made to the code that simulate typical programming errors or enforce valuable tests. Each modified version of the code is called a "mutant". The tests must "kill" these mutants by revealing the differences between the original code and the mutated code.
Mutation Testing helps us uncover code that is not tested well enough, even though it may already be covered by a test. The Mutation Score software metric helps us measure the effectiveness of the tests. This metric is also expressed as a percentage: it relates the number of mutants killed to the total number of mutants. A mutation score of 100% means that all mutants were detected by the tests.
How good are my tests at detecting errors? Are they effective enough to distinguish correct from incorrect code? Do my tests test the correct behaviour? These are the questions we can answer with mutation testing. A high mutation score means that your tests are robust and would probably find real bugs. A low score, on the other hand, reveals weaknesses in your test suite that need to be improved.
Infection can be used to perform mutation testing on code that has tests implemented using PHPUnit.
So what now?
Path Coverage focuses on structural completeness and ensures that all possible execution paths through the code are tested. Mutation Testing, on the other hand, focuses on the quality of the test logic and checks whether the tests are actually capable of detecting errors.
Code Coverage, including Path Coverage, can only measure the extent to which code is executed by tests, but not whether the tests actually validate the code behaviour. Mutation Testing overcomes these problems of code coverage by covering both execution and assertions.
Collecting data for path coverage with Xdebug requires many times more time and memory than is necessary for line coverage. And since I am not yet satisfied with the reporting of path coverage in PHPUnit, I personally rely primarily on mutation testing with Infection in my daily work. The reports generated with Infection are easy to read and offer concrete recommendations for action, while the resource requirements are significantly more moderate. I therefore only use path coverage in specific cases when specific analyses require it.