About five years ago, I came across an academic paper titled The ACPATH Metric: Precise Estimation of the Number of Acyclic Paths in C-like Languages by Roberto Bagnara, Abramo Bagnara, Alessandro Benedetti, and Patricia Hill. I found it interesting enough to create a TODO item for implementing this software metric in my sebastian/complexity library. Then, as it goes with TODO items, it sat there for years.
Recently, I remembered this open issue and thought it might make for an interesting experiment: instead of implementing the metric myself, I would use an AI coding agent to do it for me. Not because I wanted to save time, but because I wanted to observe what happens when I delegate a non-trivial task to an AI agent in a domain where I am not an expert.
What ACPATH measures
Most developers are familiar with Cyclomatic Complexity, which counts the number of linearly independent paths through a function. ACPATH takes a different approach and counts the number of unique, non-looping (acyclic) execution paths.
To understand what this means, we look at a function's control flow graph: a directed graph where nodes represent statements or decisions and edges represent the flow of control between them. Conditions such as if statements appear as diamond-shaped decision nodes with two outgoing edges, one for the true branch and one for the false branch.
Sequential decisions multiply the path count. Let us look at the following example:
function f(bool $a, bool $b, bool $c): int { $x = 0; if ($a) { $x += 2; } if ($b) { $x -= 1; } if ($c) { $x *= 2; } return $x; }
Each of the three if statements independently splits control flow into two branches. Because the decisions are sequential, the path counts multiply: 2 × 2 × 2 = 8 acyclic execution paths. The ACPATH value of this function is therefore 8.
The experiment
My first prompt asked the AI agent to implement the ACPATH metric as described in the paper. The output conveyed that exactly what I had asked for had been implemented. Tests were also generated. The tests looked good to me, and for simple functions where I could easily count the number of execution paths manually, the implementation yielded the same result. The same seemed to be true for the code that is used as fixture in the tests.
With my second prompt, I requested a tool to visualise the control flow graph, path enumeration, and complexity decomposition. The purpose of these visualisations was to support my reasoning about whether the implementation was correct. Again, at first glance, this also seemed to work.
The code for the implementation of the algorithm that calculates the ACPATH metric looks clean and I can read it. All of this took the AI agent about 15 minutes.
The problem
And here is where it gets interesting. Unlike every previous software development task where I have experimented with LLM-assisted coding, in this case I am not a domain expert.
Yes, I am a computer scientist. I can read the paper, and if I invest enough time, I am confident I would understand it and would be able to implement the algorithm myself. But that would take hours, perhaps days. The AI agent took 15 minutes.
And now I have been wondering for hours already whether or not the implementation it produced is correct.
The tests pass. The code is readable. The visualisations look plausible. For the cases I can verify manually, the results are correct. But for the cases I cannot verify manually, which are precisely the cases that matter, I have no way of knowing whether the implementation faithfully follows the algorithm described in the paper. I would have to understand the paper deeply enough to verify that, and if I understood the paper that deeply, I could have implemented it myself.
Is this a productivity boost?
This is the question I keep coming back to. 15 minutes of AI agent time versus hours or days of my own time sounds like a clear win. But I have now spent more time trying to verify the result than I saved by not implementing it myself. And I am still not certain that the implementation is correct.
Is this really a productivity boost? Probably only for people who do not care about whether or not they produce software that works correctly. Or for people who do not realise, or do not admit to themselves, that they cannot judge the code generated by an AI agent.
When I have used AI agents for tasks where I am the domain expert, verification is straightforward. I can read the generated code and immediately see whether it does what it should. The AI agent saves me the mechanical effort of typing, but the intellectual effort of understanding and verification remains with me, where it belongs.
But when the domain expertise is missing, a fundamental asymmetry emerges: the AI agent can generate code in a domain faster than I can understand that domain. The code looks right. The tests pass. But looking right and being right are not the same thing.
Food for thought
This experiment has given me a lot to think about. The promise of AI-assisted coding is that it makes us more productive. And it does, in a narrow sense: code gets written faster. But writing code was never the bottleneck. Understanding the problem, designing the solution, and verifying the result are the hard parts. The AI agent helps with none of these.
The ACPATH experiment has made something concrete that I had previously only suspected: the real danger of AI-assisted coding is not that it produces bad code. It is that it produces code that looks good enough to make us stop questioning it.
As Sir Tony Hoare once said:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.
AI-generated code gravitates towards the second kind. Our job is to insist on the first.