라벨이 Developer Productivity인 게시물 표시

Question Your Defaults — How Model-Harness Overfitting Is Slowing Down Your Agent

Question Your Defaults — How Model-Harness Overfitting Is Slowing Down Your Agent In Part 3 of this series, I mentioned a fascinating fact. On Terminal Bench 2.0, Claude Opus 4.6 ranked 33rd inside Claude Code — the very harness it was trained in — but jumped to the top 5 when used with a different harness. I didn't fully unpack what that number means. While covering Anthropic's architecture in Part 4 and the hands-on guide in Part 5, I glossed over the most counterintuitive and practically important insight of the entire series. Using the default harness as-is may not be optimal. This post is where I address that. How Overfitting Happens Frontier coding models are post-trained inside their own harnesses. Claude is optimized through thousands of hours of coding tasks in the Claude Code environment; Codex models go through the same process in the Codex environment. During this process, the model adapts to the patterns of its specific harness: How Claude Code invokes to...

Harness Engineering in Practice — How to Apply It to Your Project Right Now

Harness Engineering in Practice — How to Apply It to Your Project Right Now You understand the concept (Part 3). You've seen how Anthropic implements it (Part 4). That leaves one question. How do you apply it to your own project? This post covers concrete methods for putting harness engineering to work in production, and the shifts in the developer's role that this paradigm will bring. Principle 1: Start from Failure This is Mitchell Hashimoto's principle — and one the HumanLayer team arrived at independently. Don't try to design the ideal harness upfront. Every time the agent fails, add a structural safeguard that prevents that failure from recurring. In HumanLayer's words: "Have a shipping bias. Only touch the harness when the agent actually fails." The mindset resembles TDD (Test-Driven Development). Just as you write a failing test first and then write the code to make it pass — you observe the agent's failure patterns and add harness eleme...

How AI Coding Changed Completely in 18 Months — Is Prompt Engineering Dead?

How AI Coding Changed Completely in 18 Months — Is Prompt Engineering Dead? In late November 2024, I used AI for the first time. Eighteen months later, the changes I've witnessed aren't just about better tools. The entire way of working has transformed — and the pace of that transformation is accelerating. At first, a new paradigm emerged roughly every six months. Then every three months. Now, something new drops almost every week. AI has moved past its infancy and entered a full-blown transition period. Standing in the middle of it, I thought it was worth looking back at these 18 months. The Past — Copy-Paste and Prompt Engineering The Chat Window Era To be precise, it started with asking ChatGPT about code. I'd paste a function or an error message, get a response, copy it, and move it to my editor. The novelty of asking AI instead of searching Google was refreshing, and I was genuinely impressed by the accuracy. But fundamentally, the workflow wasn't all that d...