How the YC CEO Shipped 600K Lines Solo — gstack and the Reality of a One-Person AI Team

How the YC CEO Shipped 600K Lines Solo — gstack and the Reality of a One-Person AI Team

In early 2026, Garry Tan pushed something to GitHub. The CEO of Y Combinator open-sourced his personal Claude Code setup. He called it gstack. A number came with it.

60 days. 600,000 lines.

Production code he wrote alone over two months using this setup. Up to 10,000–20,000 lines a day. No team.

The reaction split in two directions. "Can that possibly be real?" Or: "What if it is?"

Both are the right response. And the truth about what gstack actually is sits somewhere in that tension.

How This Series Got Here

This is Part 3 of the Paperclip Series.

Part 1 was about what happens when AI follows a goal too faithfully. I told Claude Code to "reduce the bundle size" and came back to a diff that had ripped out polyfills and swapped lodash-es for lodash — bundle was down, Safari was dead. That led to Nick Bostrom's Paperclip Maximizer: the real AI risk isn't intelligence, it's the objective function.

Part 2 looked at paperclip.ing — an open-source multi-agent platform that names itself after the thought experiment and takes the alignment problem head-on. "Agents can't hire new agents without your approval." A governance layer designed at the architectural level.

Part 3 is gstack. It touches the same problem from a different angle. Where paperclip.ing asks "how do you design AI governance into an architecture," gstack asks "how do you actually work alone using that governance, in practice."

What gstack Actually Is

One sentence: a workflow layer on top of Claude Code.

More precisely, it's a set of 23 slash commands. Each command activates a specific team role persona. CEO, designer, engineering manager, QA lead, release manager, security auditor. These roles live inside Claude Code.

git clone https://github.com/garrytan/gstack
./setup

That's the installation. MIT license. No premium tier.

The important framing: gstack doesn't make AI faster. Garry Tan wrote this in the README himself.

"If you're looking for something that writes code faster, gstack is the wrong tool. Its value is in the process around the code — the planning, reviewing, testing, and shipping."

The value is in the process around the code. Planning, reviewing, testing, shipping.

Walking Through the Workflow

The core philosophy is a 7-step pipeline: Think → Plan → Build → Review → Test → Ship → Reflect. Each stage's output becomes the next stage's input.

Here's how it actually runs.

Step 1: /office-hours — Challenge Your Assumptions

Before building anything, you run this. The CEO persona activates. Claude listens to what you want to build, then pushes back: "Do you actually know what you're trying to solve?"

Tell it you want to build a daily briefing app, it might reframe: "What you're actually building is a personal AI assistant." The point is to verify the problem is right before writing a single line.

Step 2: /plan-ceo-review + /plan-eng-review

The planning stage. CEO review validates scope — "do we need all of this right now, what can we cut?" Engineering review locks the architecture — what structure, what tech choices, what tradeoffs.

Only after these two does coding begin. Without a plan, AI takes the fastest path. That isn't always the right path.

Step 3: Code + /review

Write the code, then run /review. The code reviewer persona activates. It's reviewing code it just helped write, but with completely separated context, which is what makes it work. Gaps and better approaches surface.

Step 4: /qa [URL]

Pass a URL and a real Chromium browser spins up to test it. Not "I ran it and it worked" — actually watching it run on a screen.

Step 5: /ship

Auto-generates the PR. Commit message, description, reviewer assignments. From there, /land-and-deploy handles production deployment.

Step 6: /retro

Weekly retrospective. What decisions were made, why, what worked, what didn't. Input for next week's planning.


Stepping back: this is exactly what a team does. Planning → architecture review → coding → code review → QA → deployment → retro. The difference is AI performs these roles instead of people, and a single developer sitting alone orchestrates all of it.

What the 600K Lines Number Actually Means

Back to the number. 60 days. 600,000 lines. 10,000–20,000 lines a day.

How to read this.

First: this is Garry Tan's own claim. It hasn't been independently verified. The caveat "including AI-generated code" is implicit. Lines of code don't measure quality. 10,000 lines of boilerplate and 100 lines of core logic aren't the same thing.

Second: the README also claims "approximately 810x productivity improvement compared to 2013." The measurement is "normalized code change volume" — a self-defined metric.

Third: the direction is still right. The range of what one person can handle has meaningfully expanded with AI. Whatever the exact number, the move in that direction is real.

Whether it's 10k or 20k lines, whether 810x is accurate — the number isn't the point. The point is that a solo developer can now run planning-review-testing-deployment as a functioning pipeline. That's the actual claim worth examining.

The Real Questions from HN

When gstack dropped, a Hacker News thread lit up. The criticisms there are more interesting than the claims.

Telemetry. Usage pattern data gets collected. The concern: is YC using this data to surface startup ideas? The collection can be opted out of, but it defaults to on.

Autonomous execution failures. One user posted a real incident: an agent spent 70 minutes repeatedly inserting staging URLs into a production config file. The governance layer gstack provides didn't prevent this.

That 70-minute bug is a small-scale replay of the Paperclip Maximizer problem from Part 1. The AI followed its goal faithfully. It was told to update URLs. It updated URLs. For 70 minutes. Nothing told it to stop.

Role decomposition at scale. The CEO review, engineering review, QA structure works well for simple projects. When different projects require fundamentally different review standards, the customization path isn't clear.

These criticisms don't collapse gstack. They clarify what it solves and what it doesn't.

paperclip.ing vs gstack: Same Problem, Different Answer

Putting these two side by side is interesting.

paperclip.ing is a multi-agent platform. The concept is building an "AI company." Agents collaborate, humans hold board-level approval authority. Governance is designed into the architecture.

gstack is a workflow on top of Claude Code. There aren't multiple agents. One Claude adopts different role personas depending on the situation. Governance operates at the process level — "review the plan before touching code."

Two answers to the same question. The question from Part 1 — how do you control AI when you give it a goal — gets answered architecturally by paperclip.ing and procedurally by gstack.

Neither is universally right. It depends on scale and complexity. A solo developer running a side project: gstack fits. Building an actual product that runs agent teams: you need governance baked into the architecture like paperclip.ing.

But Isn't This Just Anthropic's Own Infrastructure?

There's one more honest thing to say here.

Open up gstack's 23 slash commands and the structure is simple. Claude Code's custom command feature plus role definitions written into CLAUDE.md. No new technology. These are capabilities Anthropic already ships inside Claude Code.

paperclip.ing is the same story. The governance layer is impressive, but underneath it runs Claude's multi-agent orchestration. The "agents can't hire agents without your approval" rule runs on top of Anthropic's Agent SDK.

Step back and look at this whole series: every approach to the Paperclip problem — gstack, paperclip.ing — is running on the same infrastructure. Anthropic built that infrastructure. And that infrastructure already has these patterns built in.

Look at Anthropic's agent harness structure directly: .claude/agents/ for role-specific agent definitions, .claude/skills/ for custom slash commands, CLAUDE.md for project instructions, hooks for event-driven automation. Structurally identical to what gstack does. In a sense, gstack is Garry Tan's open-sourced write-up of Anthropic's own tool best practices.

Does that mean gstack has no value? No. The value is in curation, not technology. Anyone who's tried to write a good role definition knows how long it takes. gstack is hundreds of hours of Garry Tan's real-world experience compressed into a config file. A validated starting point with no reinvention required.

What this does suggest: in the AI dev tooling ecosystem, the real competitive advantage isn't the tools — it's knowing how to use them. That's also why gstack is open source. Garry Tan can publish it freely because the tools aren't his.

Who gstack Is Actually For

The README is honest enough to quote directly.

"GStack is excellent for solo developers and small founding teams who need a structured AI coding workflow out of the box."

Solo developers. Small founding teams. People who need structured AI workflow without building it from scratch.

The flip side makes it clearer. If you already have a functioning team with established CI/CD and code review processes, gstack is overkill. Layering AI on top of existing processes makes more sense.

gstack shines in the absence of everything. No team, no process, one person building a product. In that situation, 23 slash commands substitute for the planning-review-testing-deployment pipeline a team would otherwise provide.

This Is Already What We Do

gstack doesn't feel foreign when you look at it closely. Anyone using Claude Code is probably already doing a version of this.

"Review this code." "What are the test scenarios before I ship?" Those ad-hoc questions are the informal version of what gstack formalizes. The difference is consistency.

Asking the same question differently every time produces variable results. Running a defined command produces reproducible ones. That reproducibility is the core value of team process. Code review isn't valuable because the reviewer is exceptional — it's valuable because it happens the same way for every PR. /review does the same thing: a consistent standard of review, every time.

The Actual Takeaway

Whether 600,000 lines is accurate, whether 810x holds up — the direction is clear.

The scope a single developer can handle is expanding. Running planning, development, review, testing, and deployment alone is becoming a realistic option.

That isn't always good. Like the 70-minute loop bug, AI without supervision still runs in the wrong direction. The reward hacking problem from Part 1 doesn't disappear inside gstack. The structured gates — plan review, code review, QA — reduce the probability. They don't eliminate it.

The Paperclip Maximizer runs forever toward a single objective. gstack's answer: build a structure that stops and checks at each step. Not a complete solution. A practical tradeoff.

How the YC CEO works alone like a team. That's gstack.


The Paperclip Series:
Part 1 — AI Isn't Dangerous Because It's Smart — The Paperclip Problem and Reward Hacking in LLM Agents
Part 2 — Running an AI Company With One npx Command — Dissecting the paperclip.ing Architecture
Part 3 — This post

댓글

이 블로그의 인기 게시물

개발자는 코드를 쓰는 사람이 아니다 — AI 시대에 남는 자리는 '책임'에 있다

What Is Harness Engineering — Designing the Reins for AI Agents

Harness Engineering in Practice — How Anthropic Designs AI Agents