라벨이 Harness Engineering인 게시물 표시

Question Your Defaults — How Model-Harness Overfitting Is Slowing Down Your Agent

Question Your Defaults — How Model-Harness Overfitting Is Slowing Down Your Agent In Part 3 of this series, I mentioned a fascinating fact. On Terminal Bench 2.0, Claude Opus 4.6 ranked 33rd inside Claude Code — the very harness it was trained in — but jumped to the top 5 when used with a different harness. I didn't fully unpack what that number means. While covering Anthropic's architecture in Part 4 and the hands-on guide in Part 5, I glossed over the most counterintuitive and practically important insight of the entire series. Using the default harness as-is may not be optimal. This post is where I address that. How Overfitting Happens Frontier coding models are post-trained inside their own harnesses. Claude is optimized through thousands of hours of coding tasks in the Claude Code environment; Codex models go through the same process in the Codex environment. During this process, the model adapts to the patterns of its specific harness: How Claude Code invokes to...

Harness Engineering in Practice — How to Apply It to Your Project Right Now

Harness Engineering in Practice — How to Apply It to Your Project Right Now You understand the concept (Part 3). You've seen how Anthropic implements it (Part 4). That leaves one question. How do you apply it to your own project? This post covers concrete methods for putting harness engineering to work in production, and the shifts in the developer's role that this paradigm will bring. Principle 1: Start from Failure This is Mitchell Hashimoto's principle — and one the HumanLayer team arrived at independently. Don't try to design the ideal harness upfront. Every time the agent fails, add a structural safeguard that prevents that failure from recurring. In HumanLayer's words: "Have a shipping bias. Only touch the harness when the agent actually fails." The mindset resembles TDD (Test-Driven Development). Just as you write a failing test first and then write the code to make it pass — you observe the agent's failure patterns and add harness eleme...

Harness Engineering in Practice — How Anthropic Designs AI Agents

Harness Engineering in Practice — How Anthropic Designs AI Agents The previous post covered the concept and components of harness engineering. This time, it's the real thing. Drawing on the concrete architecture patterns Anthropic published in their official engineering blog — along with experimental results from the OpenAI Codex team — let's look at how harnesses are actually applied in practice. The Basic Structure of an Agent Loop: The Inner Loop At the heart of every AI agent sits an agent loop . In Claude Code, it's called queryLoop . At its core, it's a while(true) loop. while (true) { 1. Prepare context (plan-mode attachments, task reminders) 2. Call the model (streaming API call) 3. Execute tools (detect tool call → validate schema → check permissions → execute) 4. Decide whether to continue (does the model have more to do?) } Each iteration is one "think, act, observe" cycle. The model thinks, invokes a tool, observes the resul...

What Is Harness Engineering — Designing the Reins for AI Agents

What Is Harness Engineering — Designing the Reins for AI Agents In Part 1 of this series, I talked about the decline of prompt engineering. With CLI-based tools on the scene, the value of manually crafting elaborate prompts was fading. But as 2026 unfolded, I realized that what replaced prompt engineering wasn't simply "better tools." Prompt engineering gave way to context engineering, and now context engineering is giving way to an entirely new paradigm: harness engineering . In this post, I'll break down what harness engineering is, why it matters right now, and what its key components look like. A Harness for a Horse, a Harness for an Agent A harness originally refers to the tack fitted onto a horse. Bridle, saddle, stirrups — equipment designed not to suppress the horse's power, but to channel it in the right direction. In AI, the term means exactly the same thing. A harness is the entire external system that controls and directs an AI agent's power...