In my previous articles, “Effective Code Reviews” and “Modern PHP Development”, I explored the foundations of contemporary software engineering, including the trade-offs between synchronous and asynchronous code review approaches, the critical role of comprehensive testing and static analysis, and the necessity of well-maintained documentation.
These may have seemed like explorations of established practices. But they are not just best practices for human teams, but essential prerequisites for reliable collaboration with AI agents.
From assistance to agency
The emergence of agentic AI in software development represents a shift that goes well beyond mere automation. Unlike code completion tools that suggest the next few lines or conversational assistants that answer isolated questions, AI agents operate proactively at a significantly higher level of abstraction. Given a specification or bug report, an agent can analyse the codebase, generate an implementation and write the corresponding tests. They can then execute these tests, debug any failures and prepare a pull request. All of this can be done without continuous human guidance. This shift fundamentally changes how we think about code review, testing, and documentation, not by replacing them, but by elevating their importance as the control framework that enables and ensures the trustworthiness of autonomous AI participation.
This autonomy introduces opportunity as well as risk. Without proper safeguards, AI agents can exhibit concerning behaviours, such as disrupting functioning code, altering logic in areas not intended for modification, and introducing subtle architectural violations that accumulate as technical debt.
The distinction between AI-assisted and agentic development is therefore crucial:
- In assisted development, the human is in control, with AI providing suggestions and the developer making decisions.
- In agentic development, the relationship is inverted: the AI drives towards a goal while humans define objectives and validate outcomes.
This inversion requires a fundamentally different approach to quality assurance, where the practices outlined in my previous articles are not just helpful, but essential.
Abstraction, not replacement
Contrary to a common claim, LLM-based coding agents are not replacing compilers. Instead, they are part of a tradition of tools that enable us to think about the problems we are trying to solve at a higher level of abstraction. We progressed from writing machine code instructions, to assembly language, and then to higher-level languages, frameworks, and other reusable components. This progression reflects a fundamental principle of software engineering: raising the level at which we reason about problems reduces cognitive load, allowing us to tackle increasingly complex challenges.
The history of programming languages clearly demonstrates this pattern. Machine code required us to think in terms of individual processor instructions and memory addresses. Assembly language abstracted some of this detail away, introducing mnemonics and labels. Languages such as Fortran and Cobol then abstracted further, enabling us to express logic in terms that were closer to mathematical or business concepts. Object-oriented languages introduced further abstractions for modelling real-world concepts. Frameworks and libraries continue this progression by encapsulating common patterns and general-purpose logic.
Agentic AI systems extend this tradition by one more level. Rather than writing code directly, we are increasingly specifying behaviour through tests, requirements and architectural constraints. The AI agent then handles the intermediate step of code generation. This does not replace compilers and other foundational tools. Compilers still translate code into machine instructions, regardless of whether the code was written by a human or generated by an AI agent. Instead, it elevates the level of abstraction: we focus on what the software should do and why, leaving the implementation details to automation.
We must not forget that software development is fundamentally about solving problems. The abstractions we create, ranging from assembly language to frameworks to agentic systems, serve as means to that end rather than being ends in themselves. Every generation of tools has freed developers from routine work so that they can focus on the actual problem. But with agentic systems, the fundamental question sharpens: it shifts from "how do we organise the work?" to "who does the work?".
Human collaboration, judgement, and creative problem-solving remain the irreplaceable core of development. AI agents are just tools that enhance human capability by performing mechanical tasks, but they operate within the context of human intent and oversight. The highest-level abstraction in software development is the human understanding of the problem being solved, as well as the human relationships that enable teams to understand, refine, and validate solutions together.
The clearer we are in communicating our intent through documentation, testing, and specifications, the more powerful these tools become. Because it is still humans who define the problems, specify the solutions through rigorous requirements and tests, and bear responsibility for the systems we create.
Code Review Discipline
In my article on effective code reviews, I examined the distinct trade-offs in human collaboration created by timing (synchronous versus asynchronous) and impact (blocking versus non-blocking). The same dimensions apply when AI agents become contributors, but with new implications.
Consider pair programming, for example, which is the most synchronous and blocking human code review approach. With AI, this pattern evolves into a process in which a human writes tests that specify behaviour, while an AI agent generates the implementation. The synchronous nature of this process provides immediate feedback: if the AI-generated code fails a test, the human can clarify the specification immediately. The blocking aspect ensures that no untested code progresses, thus maintaining the quality gates that were already valuable for human teams.
Asynchronous, blocking reviews, typically implemented through pull requests, translate naturally to agentic workflows. An AI agent creates a branch, implements changes, and opens a pull request. Human reviewers then examine the changes asynchronously, but the critical difference lies in what they review. Rather than scrutinising every line of implementation code, they focus on higher-order concerns:
- Does this change align with architectural decisions?
- Are the test specifications comprehensive?
- Does it introduce technical debt?
While the blocking mechanism remains essential, the review target shifts from code correctness to specification appropriateness and architectural integrity.
The most advanced pattern, trunk-based development with its asynchronous, non-blocking code reviews, becomes viable for AI agents only when robust infrastructure exists.
As I noted in my code review article, this approach requires feature flags, branch-by-abstraction patterns, and easy reversibility. With AI agents, these mechanisms serve an additional purpose: they enable progressive autonomy. An agent's code can be deployed behind a feature flag to allow for a gradual rollout while behaviour is monitored in production. If issues emerge, the immediate rollback capability can prevent incidents from becoming prolonged.
This approach requires the comprehensive automation and testing discipline described in my article on modern PHP development. Without these, non-blocking AI contributions become reckless rather than rapid.
AI Safety Net
The modern PHP development practices that I outlined, static analysis with PHPStan, comprehensive testing with PHPUnit, mutation testing with Infection, and well-maintained documentation, were presented as pillars of sustainable development. In the context of agentic development, these same practices act as a safety net, containing AI autonomy within acceptable boundaries.
Static analysis tools such as PHPStan act as the first line of defence. They analyse code without executing it, identifying type errors, undefined variables, and potential runtime failures. When an AI agent generates code, static analysis provides immediate, objective feedback on whether the code satisfies basic correctness criteria. This automated verification prevents entire categories of defects before human reviewers even examine the code.
Automated testing is already considered unprofessional to forgo, but it becomes absolutely critical with AI agents. Tests serve a dual purpose: they verify that AI-generated code behaves correctly and specify what “correct behaviour” means. The latter role, tests as specification and documentation, represents a fundamental shift in perspective. Humans define the requirements through the automated tests they write. A comprehensive test suite essentially encodes the business logic, architectural patterns, and edge cases that an AI agent must fulfil. The more complete and rigorous the tests, the less ambiguity the AI has to interpret.
Mutation testing adds another layer of verification by testing the tests themselves. Tools such as Infection inject small mutations into code by changing operators, modifying constants, and removing conditions, to verify that tests fail when the code changes. This becomes crucial for AI-generated code: an AI might generate code that passes all tests while still being subtly incorrect. Mutation testing reveals these issues, ensuring that the test suite effectively constrains the AI's implementation choices.
Tests eliminate ambiguity
In Test-Driven Development (TDD) with AI agents, humans write tests that specify the desired behaviour. Then, AI agents are used to generate implementations that satisfy these tests. The Red-Green-Refactor cycle remains intact, but the work is distributed between human and machine:
- In the Red phase, where a failing test is written, the focus is on translating requirements into executable specifications, and this phase is entirely human-driven.
- The Green phase, where minimal code is written to pass the test, becomes AI-driven.
- The Refactor phase becomes collaborative: the AI can suggest improvements while humans ensure that the refactoring maintains architectural integrity.
This approach addresses a fundamental problem in agentic development: how can we communicate intent to an AI agent precisely enough to ensure correct implementation? Natural language descriptions, no matter how detailed, introduce ambiguity. The requirement "calculate the discount for existing customers" leaves open when someone qualifies as an existing customer, how the discount is rounded, and what happens with negative amounts. A test that expects a 10% discount for a customer with at least three orders, rounded to two decimal places, leaves no room for interpretation.
Tests eliminate that ambiguity.
Documentation: Knowledge and Constraints
In my article on modern PHP development, I argued that, although documentation has always been useful for humans, it is now more important than ever because “AI-based coding agents can also use it as a knowledge base”. This observation merits closer examination.
Architecture Decision Records (ADRs) document important architectural decisions, including their justification and expected consequences. For human teams, these records provide historical context, explaining why a particular approach was chosen and which alternatives were considered and which trade-offs were accepted. For AI agents, they constrain the solution space. For example, an ADR that records the decision to use event sourcing for certain domains would implicitly forbid CRUD-style database operations in those areas. An AI agent trained to respect ADRs will generate code that is consistent with the documented decisions, thus maintaining architectural coherence without the need for constant human intervention.
Similarly, Technical Debt Records (TDRs) constrain AI behaviour in the opposite direction, documenting deliberately made compromises and their planned resolution. An AI agent that is aware of technical debt can avoid making the same compromises elsewhere or can prioritise tasks that reduce documented debt. Quality goals define measurable quality standards for the entire project, such as response time requirements, availability targets and security compliance levels. These goals can be translated directly into fitness functions, which are automated checks that verify whether the system maintains the desired quality attributes.
When properly understood as documenting principles rather than formatting rules, coding guidelines become the bridge between high-level architectural decisions and low-level implementation choices. For example, a guideline stating that value objects should be immutable would constrain how an AI agent implements new value objects. A guideline requiring explicit exception handling for I/O operations shapes how the agent structures error management. These are not style preferences. They are architectural patterns expressed at the code level that ensure consistency when both humans and AI agents are contributing.
When AI is faster than learning
The ambivalent nature of AI agents is also evident in the differing views on their actual impact on development teams. Prof. Dr. Dirk Riehle summarises this tension succinctly:
Code AIs are beneficial in that they increase speed of developers who were brought up without code AIs and they also reduce traditional bugs that junior or distracted developers might introduce. At the same time they introduce a new type of bug that requires new skills. I suspect developers need to learn to be much more flexible with speed, in particular to learn when to slow down and review thoroughly. What I don't know is how to compensate for the deskilling of developers—throwing more AI at the problem seems to be an unlikely solution.
This observation highlights a key point: while AI agents do offer a speed advantage, this is only applicable to developers who already have a solid understanding of the underlying principles.
Recent large-scale empirical work on AI-assisted coding supports this concern. An analysis of millions of GitHub commits shows that junior developers actually rely more heavily on AI-generated code than their senior peers, yet the measurable productivity gains accrue almost entirely to experienced developers. This inversion, more AI use but less benefit for juniors, suggests that generative tools amplify existing expertise rather than substituting for missing fundamentals, reinforcing the risk of deskilling that Riehle describes.
In my article "Faster than Understanding", I explored this asymmetry through a concrete experiment: an AI agent implemented a non-trivial algorithm in 15 minutes, but verifying the result took longer than the time saved, and correctness remained unresolved. Looking right and being right are not the same thing.
At the same time, the nature of errors is changing. While syntax and simple logic errors used to dominate, more subtle problems arising from flawed architectural decisions, incomplete context capture or inconsistent implementations across different areas of code are now becoming more prevalent.
Riehle's suggestion that we must be able to switch flexibly between different speeds and recognise when a thorough manual review is necessary highlights the importance of the practices described in this article. The impact of AI agents on skills development is a particularly worrying question: If junior developers work with AI agents from the outset, which fundamental skills will they no longer acquire?
In this context, tests, code reviews, and comprehensive documentation become more than just quality assurance tools. They provide indispensable learning environments in which even AI-supported developers must understand and question the deeper connections of their work.
If we ignore these learning environments, the current pattern in which junior developers use AI more but benefit less is likely to become entrenched rather than corrected.
Preserving human agency
The framework outlined in this article provides a way forward for the safe and reliable development with agents. This path is grounded in rigorous code review, comprehensive testing, static analysis, and clear documentation. These practices are not just professional niceties any more. They are the control mechanisms that enable us to trust AI agents as collaborative partners in software development.
However, this technical solution highlights a deeper structural issue that requires acknowledgement: who owns the development process itself?
The technical practices discussed here, such as code review, static analysis, automated tests, and CI/CD pipelines, have formed the foundation of professional software engineering for decades. Their elevation to essential status is something to celebrate: finally, the industry recognises at scale that sustainable software requires discipline, collaboration, and human oversight. As somebody who has advocated these principles in Open Source communities and technical education for over 25 years, it is gratifying to receive this validation.
However, the concentration of agentic development infrastructure raises legitimate concerns about creative control and economic power. As development becomes increasingly mediated through cloud platforms, AI systems, and automated workflows, the practical ability to create software independently diminishes. The promised efficiency gains in development velocity may come with subtle yet significant structural dependencies on proprietary toolchains, infrastructure providers, and systems whose internal decision-making processes remain opaque.
The practices I discuss in this article assume that AI agents will work within existing codebases, refactoring, extending, and improving them through the systematic application of well-defined specifications. This is a compelling vision. However, there is a legitimate concern: if agentic systems can generate code end-to-end more efficiently than we can navigate, understand, and reuse existing libraries, then the incentive to build modular, reusable software diminishes. This dynamic poses a particular risk to Open Source. Why maintain a library when an AI can generate specialised code for each new context? Why contribute to Open Source when vendor-provided infrastructure can handle code generation, review, and deployment?
Like all development, agentic development depends on human specification. The more rigorous, detailed and thoughtful the specifications, whether encoded as tests, documentation, or architectural constraints, the better the AI agents perform. This is not a weakness, but a fundamental feature. It means that defining what software should do, understanding user needs, and designing maintainable, extensible systems becomes more valuable, not less. It also means that Open Source projects that excel in clear specification, comprehensive testing, and well-documented architecture are not made obsolete, but rather are made essential. Developers who have a deep understanding of systems, who collaborate effectively on complex problems, and who maintain libraries that embody hard-won expertise remain irreplaceable.
The real challenge lies in ensuring that this expertise and collaborative capability can exist beyond corporate platforms. This requires Open Source communities to actively maintain the discipline outlined in this article, which involves documenting not just code, but also the reasoning behind architectural decisions, and creating spaces for genuine collaboration where people can learn from each other and from the systems they build together. Vigilance is also required regarding the conditions under which Open Source exists to ensure that we are not pressured to surrender control of our work, and that we can continue to fork and maintain independent projects. The creative process itself must also be safeguarded from being wholly subsumed by proprietary infrastructure.
The practices outlined here are not just safeguards for AI autonomy; they are prerequisites for human agency. By insisting on clear specifications, rigorous testing, transparent review processes, and comprehensible documentation, we safeguard not only the quality of the software we develop, but also our capacity to direct its creation. This principle applies whether we are working with AI agents or with one another.
The question is not whether agentic development will reshape how we work. It already is. The question is whether we will collectively insist on conditions in which this technology serves human creativity and collaboration rather than replacing them.
This article offers a technical framework for that future. Maintaining this requires more than just code review protocols and testing discipline; it requires a deliberate commitment to ensuring that software development remains an open, understandable, and collaborative craft.
This is personal
The question posed at the end of the previous section has implications that extend far beyond software developers. Whether technology serves human creativity and collaboration or replaces them is fundamentally a question about how computing shapes everyone's daily life, and whether we collectively insist on the former.
As JĂĽrgen Geuter argues:
Personal computing must be based on individual human or group needs but also on the technology side based on open standards that allow different tools and infrastructures to connect and share and collaborate. And it's a social project of all of us building things, trying things, learning from one another. So we can build upon each others successes and failures. It's “human needs, community sharing and standards” instead of “platforms” or “everything machines”.
This vision directly opposes the current platform-dominated model, which concentrates control and economic value within corporate structures while treating users as sources of data rather than creative agents. The alternative centres on human needs rather than lock-in, community sharing rather than proprietary control, and open standards rather than closed ecosystems.
The practical implications for anyone using digital tools — teachers designing curriculum, artists sharing work, community organisers coordinating action — are identical to those facing developers. Can you move your data without losing it? Can you understand what systems do with your information? Can you collaborate beyond platform boundaries? Can you build on the work of others?
The practices outlined for agentic development — clear specification, rigorous testing, transparent review and comprehensive documentation — represent more than just technical discipline. They also exemplify the principles of technological citizenship, insisting that systems serve human purposes rather than optimising metrics defined by those who control them.
Just as AI agents risk rendering developers obsolete by removing opportunities to develop a deep understanding, platform-mediated computing risks rendering users obsolete by removing any meaningful control. This trajectory is not inevitable. The choice between “human needs, community sharing, and standards” and “platforms” remains open, but only through collective insistence on intelligibility, interoperability, and governance that preserves human agency.
This requires sustained investment in open standards, community-governed alternatives, and institutional arrangements that prevent corporate capture of projects. As Geuter rightly names it, it is “a social project of all of us building things, trying things, learning from one another”.