Developers Don't Write Code Anymore — In the AI Era, What Remains Is Responsibility

I hear it a lot lately — that AI is going to make developers obsolete.

Engineers who use AI every day seem to feel it the most. When you watch AI write your code day after day, you start wondering how long this will last. The story used to be that only juniors would get replaced. Now senior engineers are getting lumped into the same sentence.

A friend told me recently he's thinking about switching careers. He's been coding for over thirty years, and these days he watches AI crank out in a few hours what he'd budget a week for. "Will the skills I'm building right now still matter in three years?" he asked. Another friend said the opposite — he looks at colleagues who barely use AI and wonders what they're even doing. Same moment in history, two completely different anxieties.

I get both. But I think the premise underneath this anxiety is wrong.

Half right, half wrong, really. Developers aren't disappearing. The word "developer" is pointing at something different than it used to. But that shift looks confusing from the outside, so "developers are disappearing" spreads as the easier story.

A year ago I put together an internal talk on this topic. It was framed around a complex cross-platform desktop app refactor, but what I actually wanted to talk about was one step removed from any specific project.

Not "how to use AI well." Where the center of gravity of my work was shifting, because I was using AI. That was the real subject.

And a year later, the conclusion I drew back then has only gotten sharper.

From Implementer, Back to Designer

One line from that talk still holds up.

It's not that developers are disappearing. It's that the center of gravity of a developer's work is shifting — from implementer back to designer.

The word "back" is what carries the weight.

Developers used to own the whole thing — requirements, design, implementation, testing. Through the 90s and early 2000s, "developer" implicitly meant someone who designed too. At some point that narrowed to "the person who writes code." Frameworks got richer, open source exploded, and people who could implement fast took over the title. Design drifted off into separate labels — architect, tech lead, and so on.

AI is breaking that narrowed definition.

By taking over the most visible part — implementation — AI is paradoxically restoring the original shape of the profession. People who only implement get replaced. People who handle the full arc the word "developer" used to imply — requirements, design, judgment, responsibility — move back to the center.

So when people say "developers are disappearing," the accurate reading is this. The implementer-only developer is disappearing. The developer who designs, judges, and takes responsibility is gaining ground.

Why Vibe Coding Falls Apart at the 80% Mark

From my experience, vibe coding works genuinely well for the first 80%.

Throw a few requirement sentences at it, and plausible code comes out. It compiles, the screen renders, basic functionality works. Initial velocity is explosive. In this stretch you feel it — how much easier AI has made everything. I thought, at first, that small projects were going to be done in days.

The problem is the last 20%.

One issue pops up, and no matter how you tweak the prompt it doesn't resolve. What's even stranger is that AI speaks with total confidence while pointing at the wrong cause. You follow its suggestion and something else breaks. You fix that, and yet another thing breaks. Similar-looking code piles up. After a while you can't even remember what you were originally trying to fix.

One day I caught myself three hours into a simple UI bug. I applied an AI-suggested fix and different rendering logic broke. Fixing that tangled the state management. Fixing the state management made the original UI weird again. I don't remember how long the loop ran. If I'd just debugged it myself from the start, the whole thing would've been over in thirty minutes.

What I learned from that is simple. AI analyzes the code fragment in front of it quite accurately. But the hidden assumptions of the system that fragment lives inside — when does this state value reset, what's the timing on this event, which render cycle triggers this — those it goes blind to easily. So you fix one spot, and another spot that you touched without understanding its assumption quietly breaks. You can't solve this by writing better prompts. Unless those assumptions get pulled out of code and into documents, AI keeps falling into the same trap.

Most people, at this stage, conclude "I need better prompts." I did too for a while. But no matter how carefully I wrote prompts, the same problem kept repeating.

At some point it clicked. AI is strong on short-term context. It's weak on long-term structure, consistency, and responsibility boundaries.

So the question had to change. Not "how do I get the answer I want out of AI," but "what does it take to keep AI anchored to predefined standards instead of drifting with whatever the human types?"

Three Experiments, and One Failure

I tried refactoring an LMS with AI once.

Started with "redesign the code." It went well at first, then slowed sharply midway. The problem wasn't AI. It was the absence of a design anchor. AI was producing plausible answers every time, but those answers weren't consistent with each other. Without a standard, there was no ground to anchor consistency on.

On the second attempt, I built docs first. I thought documents would fix it. Instead, every time a bug appeared AI proposed a big structural change. It built duplicate utilities two or three times over. Documents existed, but nothing was using them as a standard.

That's when I realized: "I wrote docs" and "I designed" are different things. Documents are artifacts. Design is the process of continuously validating and correcting those artifacts so they don't drift.

On the third attempt, I ran multiple AIs in parallel with shared references, cross-checking between them. That's when real functionality came together — including payments. It worked. But I also saw the ceiling. The bottleneck wasn't AI. It was the depth of my design.

After that, I tried applying the same method to a much more complex system. A cross-platform desktop app stitched together from Rust, C++, Electron, React, and Node.js. I stopped partway through. Not a deliberate pause. Closer to a controlled collapse.

Here's what it looked like, specifically. Changing one core state value touched React state, a cache in the Electron main process, and an instance held by the Rust core. The values across those three places drifted out of sync. I couldn't tell which was the source of truth and which was a replica anymore. The more I asked AI to "clean up the state management," the worse it got. Each layer gave its own reasonable-looking improvement, and the improvements cancelled each other out. I couldn't hold a single mental picture of the structure, and because I couldn't hold it, I couldn't give AI any standard to follow. Stopping was better than forcing progress.

I thought it was AI's limit at first. Looking back, it wasn't. It was the limit of my own design depth. At the multi-language, system-level scale, AI just matches surface-level consistency. The points where actual behavior and design docs collide keep accumulating, and in a shallow design, contradictions explode as implementation progresses.

The symptom shows up in React, but the cause is in how an IPC message got transformed, and behind that is a frozen assumption at the FFI boundary, and further down is a judgment made in the Rust core. In each layer, AI produces a locally reasonable answer. But the coherence of the whole flow can't be verified without documents.

The problem wasn't any single error in React, Renderer, IPC, FFI, or Core. The problem was that nothing was binding those per-layer judgments into a single design.

One valuable thing came out of that failure. AI doesn't tell you when your design is shallow. Each layer returns a plausible answer. Whether those answers contradict each other is something only a person holding the whole flow can feel. Without that person, the contradictions quietly pile up in the codebase until they explode in production one day.

What Was Left After the Failure Wasn't Documents — It Was an Operating Tool

Looking back at the desktop app that stalled, I tried to figure out what I'd missed on the second LMS attempt too.

Why didn't having docs work? Because the docs existed, but AI wasn't using them as an anchor. AI references documents when a conversation starts, in theory, but in practice it skims the surface and generates whatever's easy to generate. The mere fact that a document exists doesn't mean the standard is operating.

So what I tried next was a system for operating documents as anchors. The minimum setup looks like this.

docs/ — design standards that fix judgment and context
ENGINEERING HANDBOOK — the shared starting point of judgment for the team (or the AI agents)
DESIGN / CODE / IPC-EVENT GUIDES — documents that connect design, implementation, and boundary consistency
DEBUGGING RUNBOOK — a mechanism for looking back at the design from operations

The point of this setup is that documents are not "artifacts." Documents are tools for operating judgment. For multiple AIs to cross-check, they need to reference a single standard — an SSOT (Single Source of Truth). If that standard fragments, verification itself becomes impossible.

What I noticed running multiple AIs in parallel: the places where AIs disagree are almost never "AI errors." They're gaps in my design. Find where they diverge, reflect that gap back into the doc, run the AIs again, find the next divergence. Repeat the cycle and the ambiguity in the design gets stripped away.

A design document isn't a deliverable you write once and finish. It's an operating tool — something that gets continuously validated and corrected through conversation with AI. Only after I internalized that did I actually understand what was missing from the second attempt.

And once the system was in place, the first change was surprisingly plain. Consistency. Variable names, channel names, domain rules all converged naturally on the document's standard. Duplicate utilities and near-copies dropped visibly. The thing that struck me most: failure signals started showing up in the docs before they showed up in code. When you can spot contradictions at the document level, the after-the-fact firefighting loop at the implementation level disappears.

Why the "Supervisor" Framing Is Wrong

After going through all this, my first instinct was to frame it as:

"OK, so going forward, developers become AI supervisors."

There's a trap in that phrase. A supervisor just supervises. Sees the result, approves or rejects, done. But that's not what the job actually needs. You need to be able to explain why what AI produced is right or wrong. You need to be able to verify it with counterexamples. You need to remember the system's unspoken rules.

A pure supervisor gets fooled by surface-level consistency.

What AI produces usually comes out "making sense." Variable names are reasonable, functions are separated, error handling is sort of there. If you don't have depth, you just let it through. Then one day you cross a threshold and contradictions explode. When someone asks who approved this, nobody steps forward. Everyone just says, "My part looked fine."

So instead of "supervisor," the more accurate phrase is "the person who defines what's correct and takes responsibility for the result." That's the one line I most wanted to land in the talk.

AI can scale implementation and verification, but it can't be the subject of responsibility.

This isn't wordplay.

Responsibility isn't a legal declaration. It only holds up where someone has the ability to judge. Someone who can't judge isn't a responsible party. They're just someone who stamps things. That's why "the person who approves what AI makes" sounds plausible short-term, but the role doesn't hold up for long.

Here's how it actually decays. First a title gets created — "AI Output Reviewer." The reviewer reads AI-generated code and stamps OK or NG. Over time, AI quality goes up and NGs become rare. Then the organization starts asking what this title is for. If the reviewer can't explain the NGs with technical grounding, the role collapses into "auto-approve, escalate if something breaks." A layer of middle management disappears, and the reviewer ends up as someone who does neither implementation nor design. That's the role that gets cut first in the AI era.

To actually bear responsibility, you need at least these:

You need to understand the domain model
You need to be able to articulate the system's boundaries and invariants
You need to verify, with counterexamples, why AI's output is right or wrong

Without these, the "responsible" title is nothing more than a signatory.

Defining Correctness — And Managing Constraints

Dig a layer deeper and "defining what's correct" alone isn't enough.

In actual production work, there's rarely a single correct answer. Most of the time you're choosing among several imperfect options — picking the one that breaks the fewest constraints. And the things you weigh simultaneously include:

Business coherence
Technical debt
Security and legal risk
Operability
The complexity the organization can currently absorb

AI is great at generating candidates. But which of those candidates is "the one that fits us" only gets decided inside the organization's memory and accountability structure. What AI doesn't know is stuff like "this team can't absorb this level of complexity right now."

Here's the kind of situation that happens all the time. AI says, "This code would be cleaner if you refactored it this way." Technically correct. But I know something AI doesn't. The team maintaining this module is chasing a different milestone right now. Restructuring it will collide with another team's release next month. And this code is scheduled to be rewritten in a completely different direction early next year anyway — so anything you refactor today is likely to get discarded wholesale in a few months.

So "technically correct" and "correct for us right now" are different things. AI sees the front. It can't see the back. The back involves the organization's current state, other teams' situations, roadmaps — things tangled together in ways AI can't access.

Similar situations are everywhere. Say AI suggests: "This API response structure would be cleaner if we renamed the fields from snake_case to camelCase — it'd simplify the frontend code." True. But three external partners consume this API, and one of them hasn't touched their response parser in three years. Change the field names and that partner blows up first. AI doesn't know this partner exists. It's not in the code. It's not in the docs. This information only lives in the head of someone who's been at the organization for years. And someone with that information needs to be able to say "no" for the system to hold. Whether that person exists or not gets thrown into much sharper relief once AI enters the team.

So developers are shifting toward selector, judge, constraint manager — not generator. Put differently, "the person who can pick the right answer from the bunch AI just generated" becomes scarce.

One more thing. Every system has undocumented, implicit rules. "This data must only move through this path." "Don't break this flow." "For this feature, explainability matters more than speed." "Never touch this user experience." These rules usually aren't in any document. They only live in people's heads.

And that's exactly where AI often goes confidently wrong.

So the image going forward isn't really "ethics supervisor." It's closer to someone who remembers and enforces a system's implicit rules. Ethics is a subset of that. Security, operational stability, user trust, the organization's unspoken contracts — they all fall in the same category.

So This Is How I Work Now

After going through all of this, for the past six months my workflow looks completely different.

From November 2025 through now (April 2026), I've run almost all development through vibe coding. The time I spent actually typing code into an editor is minimal. Claude Code and Codex are the primary implementers. I'm on the side, designing the environment so they can understand context accurately and modify code correctly.

Here's what I actually do day to day.

First, writing ADRs (Architecture Decision Records). I record why this structure was chosen and why alternatives were rejected. Without ADRs, AI keeps looking for a fresh "optimal solution" every time and overturns prior decisions without meaning to. With ADRs, a new agent session opened six months later still makes the same call. In the AI era, an ADR isn't "a record of the architect's taste" — it's closer to "the agent's memory device."

Editing docs is a constant task. When AI hits a point of ambiguity while reading code, I pull that ambiguity up into the design documents and make it explicit. I don't try to fix it with code comments. AI sometimes ignores comments; it trusts design docs more.

Editing skill definitions. When AI repeatedly makes the same mistake, I don't patch it one prompt at a time — I edit the skill definition itself. Fix it once, and every agent that calls that skill gets the improvement. One correction reflected across hundreds of executions.

Composing agent teams. Instead of one agent doing everything, I split by role so each agent only judges within its own domain. Strategy, execution, quality review, publication — this division prevents a single agent from getting overloaded with context until its judgment clouds. Same reason separation of concerns matters in human teams.

Lastly, building hooks and MCPs. I plant automated guardrails at the points where AI is most likely to slip. Validating specific rules before commits. Automatically injecting required context when specific files get modified. This is a completely different approach from "telling AI not to do X." Instructions can be ignored by AI. Hooks can't. If you want to reduce mistakes, don't add more instructions — embed structure.

And through all of this, the thing I'm constantly watching is token waste. What context gets injected when, which agent takes which task — these decisions determine the quality of the output, and simultaneously the cost. Running a 5,000-token task as a 50,000-token task doesn't improve the result. It just inflates the bill. So as a designer, I don't just look at what AI produced. I look at how much it cost to produce. That's a new sense that didn't exist before.

Over these six months, a nearly perfectly working app has come together this way. I've written very little code. But the reason it works the way it does is that behind it sit dozens of ADRs, hundreds of skill adjustments, hook and MCP setups, and agent team designs — accumulated layer by layer.

Before, if a "build this feature" ticket came in, I'd have opened the editor first. Now the question changes. Which agent do I hand this to? What documents does this agent need to understand the context? Are existing skills enough, or do I need a new task-specific skill? Is there a likely failure point I can block with a hook? I make those judgments first, then run the agent. The agent writes the actual code. My job is setting up the environment so the agent doesn't go wrong.

What I said in last year's talk — "implementer shifting to designer" — isn't an abstract declaration anymore. It's the literal shape of what I do every day. Not writing code hasn't reduced my work. The center of gravity has just moved somewhere completely different.

The Question That Isn't Answered Yet — How Do You Grow These People?

Here's the honest part.

"Designer / responsibility-holder / judge" — sure, that's the role that survives. But how do you grow people who can fill it? I don't have the answer yet either.

A senior developer's judgment, historically, came from implementation pain. Failed implementations, production incidents, 3 a.m. data integrity collapses, performance bottlenecks, communication breakdowns in teams, rollbacks. Accumulating those actual losses is what built the instinct — "this decision is dangerous for these reasons."

What happens if AI absorbs too much of that implementation pain?

The next generation might look fast on the outside but shallow on judgment. This is a real risk. In fact, I think it's already starting. Juniors submitting PRs for AI-written code they don't understand. Teams merging AI-refactored output without reading it. The scene isn't strange anymore.

The problem is that it doesn't look bad at the moment. The code runs. The tests pass. The PR goes through. The real problem shows up six months later. The AI that wrote the code doesn't remember the context. Neither does the person who merged it. When a bug hits, nobody can interpret the structure, and because nobody can interpret it, nobody can fix it. So it gets rewritten under the banner of "this module's gotten too complex, let's redo it" — the same loop repeats. Code piled up fast without judgment muscle keeps demanding teams without judgment muscle.

So the growth path of a developer probably shifts like this.

Before: the person who had implemented a lot got deep
Going forward: the person who has judged a lot and experienced verification failures gets deep

The most valuable learning asset going forward is not code. It's failure logs, decision records, and counterexample collections. Whoever has accumulated concrete cases of "why this design was wrong" becomes the strongest judge.

What specifically do you record? Three things I'm deliberately collecting these days.

Logs of the moments AI was wrong. When AI is plausibly wrong, don't let it pass. Jot down briefly — why it was wrong, what it missed, how to recognize the same pattern next time. Over time, your judgment gets sharper. AI's mistake patterns repeat surprisingly often. Once burned by a pattern, the second time you get burned less.

The "why" behind decisions. If you only record conclusions in design docs, even you won't remember why a conclusion was reached. "We chose this structure because of A, B, C, and rejected option D because of E" — notes like that become your most valuable asset later. Here's what a real memo looks like: "Decided to keep the payment flow as synchronous calls instead of event-driven. Reason 1: current backend team of three lacks experience debugging async systems. Reason 2: payment failures need immediate user feedback, which event queue latency would break. Reason 3: traffic is still tens per second, so sync is enough. Revisit async in 6 months once team size and traffic are re-examined. The rejected message-queue option becomes reconsiderable once these three reasons are resolved." With a memo like that sitting there, six months later when you've forgotten and your team's AI agents suggest "let's refactor this cleanly to event-driven," you can shut it down immediately.

Counterexample collections. Record, as concrete cases, what happens when a rule is broken. AI memorizes rules well, but doesn't really know when and why a rule becomes dangerous. Counterexamples are the asset only the judge has. Say the rule is "don't log user identifiers in plaintext." Give AI only the rule and it obeys in the moment, then quietly breaks it in some debugging code you add a few days later. But attach a counterexample, and the story changes: "In the second half of last year, the support team needed to resolve a user inquiry, got temporary access to the log viewer, and saw email addresses logged in plaintext. The compliance audit afterward raised five issues." With that paragraph sitting next to the rule, AI hesitates every time it's about to break it. Because now it can see why the rule exists.

So — don't erase the failures you're accumulating while using AI. The moments AI was plausibly wrong, the moments you were wrong one level above that, the moments you started without a design and quit partway. These are all assets. The kind of records I mentioned earlier — the three attempts and the stalled desktop app refactor — belong in that pile.

The Space That Remains Is Actually Widening

The better AI gets at writing code, the more "implementer-only developers" disappear.

What becomes rarer in their place is "the developer who defines what's correct, manages constraints, and takes responsibility." That second developer is not a supervisor. Not someone who stamps results — someone who can explain the result and verify it with counterexamples.

Responsibility isn't a title. It's interpretive ability.

What's interpretive ability? It's this. AI returns a result: "This function added a cache to optimize performance." Someone with interpretive ability asks — what's the cache's expiration condition? What happens under concurrency if values tangle? This function crosses session boundaries; if the cache lives past session end, aren't you exposing the previous user's data to the next user? Someone who can ask those questions is a responsible party. "Hm, looks clean, merge" is just an approver.

For that to be possible, you need to understand the domain, be able to articulate the system's invariants, and be able to sense the risk points in AI output before they surface. Anyone who can't do that ends up as a middle manager rubber-stamping AI output. And middle managers are the first role to disappear as AI matures.

When people say AI will eliminate developers, what it really means is this. People who can't take responsibility for AI's output lose their seat. Meanwhile, people who can take responsibility find their seat widening. In an era where one judge can run multiple AIs simultaneously, the leverage on a single person's judgment got bigger.

Concretely, here's the picture. A project that used to need ten developers can now be run by two or three people with design sense plus an agent team. In that structure, the people who get most valuable are the ones who can be in that "two or three." The condition for getting there isn't "how fast you type code" — it's "how precisely you know what must be preserved in our system." In the past, skilled implementers created leverage. Going forward, skilled judges create leverage.

So what we need right now isn't anxiety. It's a plan for where to grow the judgment muscle.

Because the more AI takes implementation, the more of what's left is judgment.

이 블로그 검색

goldtag