Sunday, December 07, 2025 · 7 min read

Who Fixes the AI Code in 2030?

AI writes the code. Humans still maintain it. We’re not training enough of them.

There’s a specific feeling you get reviewing AI-assisted code after you’ve done it long enough.

Not wrongness, exactly. The code works. But a kind of… flatness. Each function optimized for its immediate purpose, unaware of anything beyond its own scope. Correct in all the ways tooling can verify. Subtly broken in all the ways that only surface six months later when someone tries to extend it.

I’ve been chasing that feeling for two years now, trying to articulate what bothers me about code that passes every automated check yet still makes experienced engineers uneasy.

The productivity gains are real. Boilerplate that used to take an afternoon, done in minutes. Edge cases handled. Documentation generated. My team ships faster than we ever have. Velocity metrics that would make any engineering manager smile in a quarterly review.

But the code accumulates in the codebase like sediment. Each AI-generated module is a small island, connected to the rest of the system by the thinnest possible bridges. No conversation between components. No shared architectural language. Just a growing collection of features that happen to coexist in the same repository.

I’m not the only one noticing. And the data is starting to confirm what many of us have felt.

Image by Getty Images on Unsplash

What 300 Repositories Revealed About AI’s Blind Spot

Ox Security recently analyzed over 300 open-source repositories, including 50 that were wholly or partially AI-generated. The findings put language to something I’d been sensing in code reviews for months.

They call it the “Army of Juniors” effect: AI produces code that is highly functional but systematically lacking in architectural judgment. The code works. It just doesn’t think.

Ten critical anti-patterns appeared with alarming frequency. Excessive inline commenting cluttered 90–100% of AI-generated projects, comments that increase noise without improving clarity. More telling: 80–90% showed “avoidance of refactors.” AI never improves existing code architecturally. It only adds. And in 70–80% of projects, identical bugs kept recurring because AI violates basic code reuse principles. The same mistake, copied into new contexts, over and over.

GitClear’s analysis of 211 million lines of code from 2020–2024 provides the quantitative backbone. Code churn (the percentage of lines reverted or updated within two weeks) has doubled since AI adoption surged. Refactoring dropped from 25% to less than 10% of all code changes. Copy-pasted code blocks increased eightfold.

That last number hit me. I’ve watched it happen on my own team. When AI generates a solution, developers accept it. When a similar problem appears elsewhere, AI generates a similar solution. No one consolidates. No one abstracts. The codebase grows sideways instead of upward.

We’re told this is a tooling problem. Better prompts. Better models. Better review processes. And sure, those help.

But I’m increasingly convinced the real crisis isn’t the code we’re shipping today. It’s the developers we’re not developing for tomorrow.


The Productivity Paradox Nobody Wants to Acknowledge

Here’s where it gets uncomfortable.

METR, a nonprofit focused on AI evaluation, recently ran what might be the most rigorous productivity study to date. They recruited 16 experienced open-source developers from mass projects with 22,000+ stars and over a million lines of code, then ran a randomized controlled trial across 246 tasks.

Developers using AI were 19% slower than those working without it.

But here’s the twist: before starting, those same developers predicted AI would make them 24% faster. After completing tasks, they estimated they’d been 20% faster. They couldn’t tell they were slower. The tools felt productive even when they weren’t.

I recognize this feeling. There’s something satisfying about watching code appear on screen, about the rapid-fire suggestions, about the sense of momentum. It feels like progress. Whether it is progress depends on what happens when that code needs to be maintained, extended, debugged by someone who didn’t write it.

Stack Overflow’s 2025 survey of 49,000 developers reinforces the pattern. The top frustration with AI tools? Sixty-six percent cite solutions that are “almost right, but not quite.” Trust has eroded significantly: only 33% now trust AI output, down from 43% last year. Active distrust rose from 31% to 46%.

The gap between perception and reality may be the most dangerous finding of all. If developers feel productive while accumulating debt, they’ll keep accumulating it. If leaders see velocity metrics climbing, they’ll celebrate. The problems remain invisible until they’re not.


Half of It Works. The Other Half Might Be Worse.

Multiple independent studies converge on a troubling number: roughly 40–50% of AI-generated code contains security vulnerabilities.

Georgetown’s Center for Security and Emerging Technology tested five leading LLMs and found almost half the code snippets produced contained bugs that could lead to malicious exploitation. Veracode tested over 100 LLMs across 80 completion tasks and found 45% produced OWASP Top 10 vulnerabilities. Java code fared worst, with a 70% security failure rate.

Stanford’s research adds a psychological dimension that keeps me up at night. In a controlled study, participants with AI access wrote significantly less secure code on four of five tasks. More concerning: AI users were more likely to believe they’d written secure code. False confidence compounding technical risk.

I think about this when I’m reviewing PRs. The developer submitting the code believes it’s solid. The AI that generated it has no concept of belief at all. And I’m the last line of defense, scanning for problems in code written by a process optimized for plausibility rather than correctness.

The speed makes it worse. As Ox Security’s VP of Research put it: functional applications can now be built faster than humans can properly evaluate them. We’ve created a velocity that outpaces our ability to verify.


The Question Nobody in Leadership Wants to Answer

Here’s what I keep coming back to, the question that makes this more than a technical problem:

If we’re automating the tasks that used to train junior engineers: the boilerplate, the bug fixes, the small features that taught them how systems actually work. Who develops the architectural intuition to maintain these systems in five years? In ten?

The numbers are stark. According to Revelio Labs, entry-level tech job postings have declined 35% since January 2023. Stanford tracked ADP payroll data and found something uncomfortable: 16% fewer developers aged 22–25 are employed than in late 2022. When seven out of ten hiring managers believe AI can handle intern-level work, the decline starts to make sense.

The traditional apprenticeship model is fracturing. AI now handles the tasks that historically trained juniors: boilerplate generation, simple bug fixes, documentation, test case creation. The boring work. The work that teaches you how codebases actually function, how decisions ripple through systems, why that weird pattern in the legacy code exists and what breaks if you change it.

Charity Majors, CTO of Honeycomb, captured it precisely:

“By not hiring and training up junior engineers, we are cannibalizing our own future.”

We’re eating the seed corn. Optimizing for this quarter’s harvest while ensuring there’s nothing to plant next year.

McKinsey projects a significant senior developer shortage by 2030 if current patterns continue. This isn’t hyperbole. It’s arithmetic. If companies collectively stop training entry-level developers because AI handles their traditional tasks, the pipeline producing mid-level and senior engineers simply won’t exist.


Why Organizations Can’t See What’s Coming

I’ve sat in enough planning meetings to understand why this keeps happening.

Velocity is visible. Architectural decay is invisible — until it isn’t. AI adoption metrics are what leadership tracks. Technical debt doesn’t appear on balance sheets. The people who see the problem most clearly, the senior individual contributors reviewing code every day, don’t control budgets.

And there’s a temporal mismatch that makes everything worse. The benefits of AI coding tools are immediate: faster PRs, more features shipped, impressive quarterly numbers. The costs are deferred: codebases that resist modification, security incidents that haven’t happened yet, a talent pipeline quietly drying up.

Short tenure culture amplifies the problem. The average tech executive stays in role for three to four years. The technical debt being created now will mature in five to seven. The people making today’s decisions won’t be around when the bill comes due.

MIT Professor Armando Solar-Lezama crystallized it:

“AI is like a brand new credit card that is going to allow us to accumulate technical debt in ways we were never able to do before.”

We all know how credit card debt works. The minimum payments feel manageable until suddenly they don’t.


What I Would Do Differently

I don’t have a framework to sell you. But if I were building a team’s AI practices from scratch, here’s where I’d start.

Track metrics that matter more than velocity: code churn rate, security findings per sprint, time-to-debug for AI-assisted versus human-written code. The numbers might be sobering, but at least you’d be looking at them.

Protect junior developer roles explicitly. Not as charity — as investment. When someone suggests you could “just use AI for that,” ask who’s going to understand this system well enough to fix it in 2028. See if the room goes quiet.

Treat AI-generated code like code from a brilliant but contextless contractor. It needs review. It needs testing. It needs someone who understands the system to evaluate whether this technically-correct solution is actually the right solution for your architecture. AI handles implementation; humans handle judgment.

And be more honest in code reviews. When something feels wrong, that flatness I mentioned at the start, don’t just approve because tests pass. Flag it. Ask questions. Sometimes you’ll be wrong, and the code is fine. But the conversation itself matters. It forces teams to articulate architectural principles that would otherwise remain implicit.


The Bill Always Comes Due

I keep thinking about compounding. About credit cards. About seed corn.

Every shortcut has a cost. The only question is when you pay it. The organizations investing in human judgment now: in training, in code review, in the slow work of developing architectural intuition. They will own the future. The ones optimizing purely for velocity are building something else. Something that looks impressive from the outside, ships fast, satisfies quarterly metrics.

Something that nobody will know how to fix when it breaks.

Some teams are building cathedrals. Others are building facades. The difference won’t be visible for years.

But it will become visible.