라벨이 "AI Agent Optimization"인 게시물 표시

Question Your Defaults — How Model-Harness Overfitting Is Slowing Down Your Agent

  In Part 3 of this series, I mentioned a fascinating fact. On Terminal Bench 2.0, Claude Opus 4.6 ranked 33rd inside Claude Code — the very harness it was trained in — but jumped to the top 5 when used with a different harness. I didn't fully unpack what that number means. While covering Anthropic's architecture in Part 4 and the hands-on guide in Part 5, I glossed over the most counterintuitive and practically important insight of the entire series. Using the default harness as-is may not be optimal. This post is where I address that. How Overfitting Happens Frontier coding models are post-trained inside their own harnesses. Claude is optimized through thousands of hours of coding tasks in the Claude Code environment; Codex models go through the same process in the Codex environment. During this process, the model adapts to the patterns of its specific harness: How Claude Code invokes tools The format in which errors are returned The order in which context is assembled The in...