Does Code Quality Still Matter in the Age of AI-Assisted Coding?

I'm increasingly hearing the sentiment that now AI models can write code for us, we no longer need to concern ourselves with concepts like "clean code", eliminating code smells, following SOLID principles etc. All of these concerns, it's argued, are purely an attempt to make the codebase more comprehensible for humans. But if humans are no longer reading the code, what does it matter? The only thing we should care about is whether the code works correctly or not.

I can partially understand this perspective. One great strength of AI agents is that they never tire. You can ask them to work on a "big ball of mud" and they won't complain. They don't mind if it's a giant convoluted monolith or an over-engineered set of microservices spread across multiple repos. They will just keep searching around in the code until they eventually find the bit they need to change.

However, I think that this is a mistake - even if we grant that we don't need code to be "human readable" any more (which I'm also not convinced of - I still find it very useful to check in on how an agent is going about tackling a particular problem). Let me give just a few quick reasons why following these "traditional" coding guidelines still matters.

Finding the right place

The first thing a coding agent needs to do when fixing a bug or adding a new feature, is to determine where in the codebase that change should be made. This involves searching, and if you look at the model's reasoning steps and tool calls you can see what it searches for (spoiler alert: it's mostly just grepping for words it thinks might be relevant).

This has several implications. First, it means that if our naming is weird or inconsistent, it will require more attempts to find the right place, slowing your agent's progress considerably.

Second, it means that it is quite possible that it will miss some relevant portions of the codebase. The "shotgun surgery" antipattern is where you need to modify many different files to implement a single feature. It's often the result of copy and pasted code, or just poor architectural decisions that don't organize key responsibilities or cross-cutting concerns into a single place. When you have code like this, the chances of your agent successfully finding all the places that need to be modified are greatly diminished.

Then, there's a context window size problem. In an ideal world, the agent reads the entire codebase in one go and can reason about the whole thing as a unified whole. But that's simply not how they work at the moment, partly because the context windows aren't large enough (despite some recent models having a 1M token context window), and partly because the quality of model output tends to degrade, the longer your session grows.

This means, for example, that following the "Single Responsibility Principle" will greatly help the model. Once it's found the single class that is relevant to the task at hand, it can read it all, without polluting the context window with lots of additional code that's irrelevant to the task at hand.

So a well-organized, modular codebase, with well-named functions and classes is going to greatly enhance the effectiveness of an AI agent working on that project, increasing its chances of quickly finding the right place to edit.

The cost aspect of this should not be underestimated. These agents can quickly burn through very large amounts of tokens, and it does seem that many of the subscription models are unsustainably subsidised at the moment.

This means that in the (perhaps very near) future, we'll all be thinking a lot harder about how to make our agents read less code and perform fewer tool calls. The fact that each agent session starts out fresh means that it often has to spend time re-learning things it previously discovered. Already we are seeing many projects designed to address this problem (e.g. I just stumbled across this one today)

It's not just the how but the what and why

Code is instructions to the computer about what it should do. It expresses the "how" but not "what" or "why". That's why good class and method names and code comments are important. They provide valuable additional context to the human reading it so they can understand the intent of the code. This contextual information is just as relevant to agents who need to make connections between the natural language instructions that you provide them, and the concepts found in the codebase.

The best way vs the quickest way

AI agents are very goal-oriented. Ask them to fix a bug or to add a feature and they will find a way to do it. Unless you explicitly instruct them to, they won't push back on the request, or propose alternative better strategies.

When a human developer is fixing a bug, they will often take a step back and ask whether this bug is actually an example of a wider category of problems. So we might actually increase the scope of the task at hand in order to prevent many similar issues in the future.

I'm increasingly seeing the idea that we could set up an automated process whereby every time an issue is raised on your GitHub repo, an agent triages it, attempts to fix it, creates and merges the PR. This is of course incredibly appealing - imagine if 90% of bugs were just automatically fixed within hours of being reported.

But unless this "bigger picture" thinking can also be baked into the fixing process, this approach could result in the classic "technical debt" problem where every issue is resolved in the "quickest way" without regard to the longer-term maintainability implications.

Summary

Code quality still matters for any codebase that you plan to improve and maintain long-term. Even if humans don't have to suffer the pain of reading poorly architected codebases, the effectiveness of AI agents can be significantly hindered by allowing structure to degrade. Investing in code quality (even if its just instructing the agents to do some rounds of cleanup and improvements after each task) will provide a stronger foundation for future development.