Test-Driven Security

A young man in jeans and a white long-sleeved shirt, rendered in black and white, swings a large green shield emblazoned with a bright red checkmark, batting away a flurry of crumpled paper balls against a soft peach background – a metaphor for using tests as an active defence that knocks out security weaknesses before they can reach production.

This article is based on a presentation I gave for the first time today at the International PHP Conference in Berlin.

What if every security vulnerability in your PHP application could be traced back to a missing test? It is a provocative question, and one I have been asking audiences when I give my "Test-Driven Security" presentation. The answer, I have come to believe, is closer to "yes" than most developers like to admit.

For decades, we have treated security as a separate discipline. Application code is written by developers, and then security is "added" later by specialists running scanners, conducting audits, or performing penetration tests. This separation has shaped tooling, organisational structures, and even how we think about the software development lifecycle. It has also produced a steady stream of vulnerabilities that could have been prevented at the moment the code was written, by the very people who wrote it, using tools they already had.

The argument I want to make in this article is straightforward: security is not a separate phase, it is a testing problem. And if it is a testing problem, then the discipline we already apply to functional correctness, Test-Driven Development, applies to it directly.

Failing tests as evidence

Most discussions of application security begin with a vulnerability and end with a fix. The vulnerability is described, the exploit demonstrated, the patch applied, and everyone moves on. What is missing from this familiar story is what should outlast the incident: a test that fails when the vulnerability is present and passes when it is fixed.

A failing test is not just a debugging aid. It is evidence: a precise, executable, version-controlled statement that says "this specific weakness exists in this specific place, and here is how to detect it." Once such a test is in place, the vulnerability cannot quietly return. A regression will turn the test red, and the build will stop. The test becomes part of the security contract that the code must satisfy.

This shift in perspective, from "security as an audit" to "security as an assertion", is the heart of test-driven security. Instead of asking "is this code secure?", which is an open-ended and ultimately unanswerable question, we ask: "what specific weaknesses am I claiming this code does not have, and where is the test that proves it?"

CWE as a checklist

The objection I hear most often at this point is reasonable: if I do not know what to test for, how can I write the test? Functional tests are easy in principle because the developer knows what the feature is supposed to do. Security tests require knowing what the feature is not supposed to allow, and that knowledge is not evenly distributed across development teams.

Fortunately, the security community has been compiling exactly this knowledge for a very long time. The Common Weakness Enumeration, maintained by MITRE, is a catalogue of software weakness types. Each entry describes a class of flaw, the conditions under which it occurs, and the consequences when it is exploited. The CWE list is not a marketing artefact. It is the structured, peer-reviewed vocabulary that researchers and tool vendors use to talk about software weaknesses.

For a developer practising test-driven security, the CWE list serves a very practical purpose: it is a checklist. For each piece of code that touches a database, a template, the shell, the filesystem, or an authorisation decision, there is a small set of CWE entries that describe what can go wrong. Each of those entries can be turned into one or more tests. The list is long, but it is finite, and most applications only interact with a small subset of it.

Four weaknesses, one pattern

In the presentation, I work through four weaknesses that together cover most of what goes wrong in a typical PHP web application: SQL injection (CWE-89), cross-site scripting (CWE-79), OS command injection (CWE-78), and improper authorization (CWE-285). The example application, a small Symfony note-taking app called NoteHub, contains a deliberate instance of each. For every one of them, I show the flaw, write a test that catches it, watch the test fail, apply the fix, and watch the test go green.

I will not repeat the examples here. The presentation material and the accompanying example application are the right place to see them play out, because the Red/Fix/Green rhythm is what carries the argument, and a static article cannot reproduce it. What I do want to highlight is the pattern that all four weaknesses share, because the pattern is more important than any individual example.

In each case, the vulnerability is not a mysterious side effect of an obscure language feature. It is the result of a well-understood mistake: user input is concatenated into a context (a SQL query, an HTML document, a shell command line, an authorisation decision) without first being treated as untrusted. And the fix is just as well understood: parameterised queries, automatic output escaping, argument escaping, an explicit ownership check. None of this is news to anyone who has read a secure coding guide.

What is consistently underappreciated is how easy each of these vulnerabilities is to express as a failing test. The test for SQL injection is a single call to the repository with a payload that should not return all rows. The test for stored XSS is a single template render whose output should not contain a literal <script> tag. The test for command injection is a single service call whose output should not contain the marker that the injected command would have produced. The test for improper authorization is a single controller call where one user attempts an action on another user's data. Each of these tests fits comfortably on a screen. Each of them, once written, prevents an entire class of regressions forever.

Input sanitisation is not the whole story

The first three weaknesses in my list, injection into SQL, HTML, and the shell, are all variations on a single theme: untrusted input is allowed to escape the data context it belongs in and influence the structure of a command. The fix is always the same in shape, even if the mechanism differs: keep data and code separated. A developer who internalises this principle will instinctively reach for parameterised queries, escaping helpers, and template engines that escape by default. Static analysis tools can also catch many instances of this category, because the patterns are syntactic.

The fourth weakness, improper authorization, is different, and that difference matters. It is not an input sanitisation problem. It is a logic flaw. The code does exactly what it is written to do; it is just that what it is written to do is wrong. No amount of escaping will fix a controller that deletes a record without first checking whether the requesting user is allowed to delete it. No static analyser can infer, from the syntax of a delete operation, that the business rule "only the author may delete a note" applies.

This is exactly where tests provide the most value. Tests can encode business rules that no general-purpose tool can guess. They can express, in executable form, the authorisation decisions that distinguish a useful application from a data leak. For authorization, the test is not just a safety net. It is the specification. There is no other place in the system where the rule "Bob may not delete Alice's note" is written down in a form a machine can verify.

TDD applied to security

Once the tests exist, the rest of the practice is just Test-Driven Development with a different focus. The discipline does not change. You write a test that expresses the behaviour you want, watch it fail for the right reason, write the smallest amount of code needed to make it pass, and refactor. The only thing that is new is the source of the requirement. Instead of coming from a feature request or a user story, the requirement comes from a CWE entry, a threat model, or an incident report.

Framed this way, it becomes clear why test-driven security is not a new methodology. It does not require a separate tool, a dedicated team, or a parallel lifecycle. It requires only that developers treat security requirements as first-class requirements, and that they express those requirements in the same language they already use to express functional requirements: the language of automated tests.

In an organisation that already practises TDD, adopting test-driven security is essentially free. The infrastructure, the skills, and the cultural acceptance of "no code without a test" are already in place. What is missing is the habit of writing security-oriented tests alongside functional ones, and the small amount of domain knowledge needed to know what to write them about. The CWE list closes that knowledge gap.

Security is a property, not a phase

The deeper point I want to leave readers with is that security is a property of software, not a phase of development. A property is something the system has at all times, maintained by the same mechanisms that maintain every other property we care about: types, tests, code review, static analysis, documentation, and the discipline of the people who wrote the code. A phase is something that happens at a particular point in time, by particular people, and ends. Phases create gaps between themselves; properties, by definition, do not.

Treating security as a phase is what produces the familiar pattern in which a vulnerability is found, fixed, and then reintroduced six months later by a refactoring that nobody connected to the original incident. Treating security as a property, encoded in tests that run on every commit, breaks that cycle. It does not forget, it does not get reorganised, and it does not leave when the people who wrote it do. It stays in the repository, and on every build it asks the same question: is this weakness present? If it ever is, the build fails, and the developer who introduced it finds out within seconds.

This is what I mean when I say that every security vulnerability could be traced back to a missing test. I do not mean that writing tests is sufficient to make software secure. I mean that for every vulnerability we find in production, there is a test that, had it existed, would have prevented it. The test is not always easy to write, and for some classes of weakness it requires real expertise to write well. But it is almost always possible, and once written it is almost always cheap to maintain.

The tools we need are the tools we already have. PHPUnit can express every test described in the presentation, without extensions, plugins, or specialised security frameworks. Symfony, Twig, Doctrine, and the rest of the modern PHP ecosystem provide secure defaults that make the tests easy to satisfy once they exist. The question is not whether we have the means. The question is whether we have the discipline to use them, and whether we are willing to treat security weaknesses with the same seriousness as functional bugs.

If you want to see what this looks like in practice, come to the presentation. Watch the tests fail, watch them go green, and ask yourself how many of the vulnerabilities in your own code base would be caught by a handful of tests you could write this afternoon. My experience is that the answer is uncomfortable, and that the discomfort is exactly what motivates the change.

Test-Driven Security brings together two topics that have shaped my work for decades: testing PHP applications and an awareness of how quickly a harmless line of code becomes a weakness. I have been developing PHPUnit for over 25 years, and in many teams I see the same gap: the tool is already there, but the step from "tests for features" to "tests for security" has not yet been taken.

Want to establish Test-Driven Development in your team and sharpen your developers' security awareness at the same time? In coaching sessions and workshops I show how CWE entries turn into concrete tests, and how PHPUnit becomes a tool that checks more than just functional correctness.

Let's get talking!

Failing tests as evidence

CWE as a checklist

Four weaknesses, one pattern

Input sanitisation is not the whole story

TDD applied to security

Security is a property, not a phase

More articles

Untouched tests are half the proof

Everything we have

Security through chaos

Stay up to date