Skip to content
Go back

What Is an Agent Harness — The Infrastructure That Makes Agents Actually Work

The industry talks about models. Which one is smartest. Which context window is largest. Which benchmark score is highest. That conversation misses the point entirely.

The model is the brain. Without a body, a brain sits in a jar.

The Formula

Agent = Model + Harness.

The model handles reasoning — what to do next, how to interpret results, when to change approach. The harness handles everything else: tool execution, context management, memory, state persistence, safety enforcement, error recovery, and human-in-the-loop workflows.

This is not a metaphor. It is the literal architecture of every working agent system in 2026. Strip the harness away and you have a stateless text-completion API. Add the harness back and you have a system that reads your codebase, triages your Slack, books your meetings, and runs overnight pipelines while you sleep.

Phil Schmid puts it directly: “An Agent Harness is the infrastructure that wraps around an AI model to manage long-running tasks. It is not the agent itself. It is the software system that governs how the agent operates.”

Firecrawl’s definition sharpens it further: “An agent harness is everything that wraps around an LLM — tool execution, memory, context management, state persistence — excluding the model itself.”

The model decides what to do. The harness decides how it gets done.

What a Harness Provides

LLMs are stateless by default. No memory across sessions. No tool access. No file system. No persistence. No safety boundaries. The harness adds all of it.

Tool execution. The model emits a structured tool call — read_file("report.md"). The harness routes that call to the actual API, handles authentication, manages rate limits, and returns the result. Without the harness, the model’s tool call is a JSON blob that goes nowhere.

Context management. A million-token context window sounds infinite until you try to fit a codebase, a conversation history, a knowledge graph, and forty tool schemas into it simultaneously. The harness decides what enters the context window and what stays out — retrieval, summarization, priority ranking.

Memory. Short-term: the current conversation. Long-term: what the agent learned three weeks ago about your Slack triage preferences. Cross-session: the knowledge graph that compounds entity relationships across every email, Slack message, and meeting note the agent processes. The model has none of this natively. The harness provides all of it.

State persistence. Sessions survive restarts. Conversations resume. Work products are saved. A 90-minute research task that gets interrupted at minute 47 picks up where it left off. Without state persistence, every interruption restarts from zero.

Safety enforcement. Permission boundaries — which folders can the agent read? Content filtering — does this output contain PII? Action approval gates — should a Slack message to #general require human confirmation? The harness enforces all of these. The model has no inherent concept of “don’t post to the wrong channel.”

Error recovery. Retry logic when an API call fails. Fallback strategies when a tool is rate-limited. Graceful degradation when context overflows. The model generates one response; the harness manages the recovery loop around it.

Human-in-the-loop. Trust ramps — confirm every action on Day 1, approve only high-risk actions by Day 30, fully autonomous by Day 60. The harness implements this progression. The model doesn’t know what day it is.

Sub-agent orchestration. Spawning four parallel research agents, aggregating their results, managing dependencies between sequential steps. The model can reason about parallelism; the harness actually executes it.

Same Model, Different Harness, Completely Different Experience

This is the key insight. Two products can use the exact same underlying model and produce radically different user experiences — because the harness is different.

Claude Code — The Terminal Harness

Claude Code is Anthropic’s terminal-native coding agent — 84.6K GitHub stars, 46% “most loved” rating among developers.

The harness is optimized for software engineering. Filesystem access scoped to the project directory. Git-aware — understands branches, diffs, commit history. Shell command execution. Sub-agent spawning for parallel work — the model reasons about which files to edit, the harness executes the edits, runs tests, observes failures, and routes results back. CLAUDE.md files provide persistent project context that survives session restarts. Hooks enforce custom policies before and after every tool call.

The model is Claude. The harness is a terminal runtime built for code.

Claude Cowork — The Desktop Harness

Claude Cowork launched January 12, 2026 as a research preview inside the Claude Desktop app. Powered by Claude Opus 4.6 with a one-million-token context window.

The harness is optimized for knowledge workers who never open a terminal. Folder-scoped filesystem access — the user grants access to a specific folder. The agent reads, edits, creates, renames, sorts, and deletes files within that scope. App automation connects to web and desktop applications. No shell. No git. No code execution.

Same underlying model family. Completely different harness. Completely different user.

Kiro — The IDE Harness

Kiro is AWS’s agentic IDE, built on Amazon Bedrock with multiple foundation models.

The harness inverts the model familiar from Cursor and Copilot. The spec is the source of truth; code is a build artifact. Before writing a single line, the harness generates structured specifications — requirements with acceptance criteria, technical design, numbered task list. The user reviews and edits. Then the agent implements from the spec.

The harness is optimized for structured development — spec-driven, document-first, implementation-second. The model generates; the harness enforces the spec → design → task → code sequence.

Amazon Quick Desktop — The Knowledge Work Harness

Amazon Quick Desktop is a knowledge work agent that surfaces hundreds of tool functions in a single conversation. These tools come from connected MCP servers — Slack, Outlook email, calendar, Salesforce, SharePoint, OneDrive, web search, browser automation, image generation — alongside sandboxed Python and JavaScript execution and a local knowledge graph. All of it is accessible without switching apps. The harness decides which MCP servers to connect and which tools to surface to the model; that is the tool exposure point.

The harness is optimized for cross-tool knowledge work. Scheduled agents run in the cloud 24/7 — they execute even when the user is offline. A skills system encodes reusable methodology (not prompts — methodology). Long-term memory compounds across sessions. A knowledge graph connects entities extracted from Slack, email, calendar, and local files. Feed notifications surface agent output as prioritized cards.

The model reasons. The harness manages connections to dozens of MCP servers and their exposed tools, persists institutional knowledge across sessions, and orchestrates parallel sub-agents.

Bedrock AgentCore — The Managed Harness

On April 22, 2026, AWS announced a managed agent harness within Amazon Bedrock AgentCore. Developers declare an agent’s model, system prompt, and tools, then run it in three API calls.

The harness manages the full agent loop — reasoning, tool selection, action execution, response streaming — inside a dedicated microVM spun up for each session. No orchestration code required. AgentCore Gateway provides governed connectivity to APIs and MCP servers with built-in auth, access control, and policy enforcement.

The harness is optimized for developers who want to build custom agents without reinventing infrastructure. The model plugs in (Claude, Llama, Mistral — any Bedrock model). The harness provides everything else. This is covered in depth in a separate post in this series.

Harness Engineering Is Becoming a Discipline

The term comes from Mitchell Hashimoto, creator of Terraform and Ghostty. His definition: “Anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.”

That is harness engineering. Not prompt engineering — the model’s instructions are one input. Not fine-tuning — the model’s weights are unchanged. Harness engineering is the practice of improving the infrastructure around the model so that reliability increases with every failure.

Blake Crosley frames the mental model precisely: “An AI coding agent is a programmable runtime with an LLM kernel. Every action the model takes passes through hooks you control. You define policies, not prompts.”

The discipline has formalized rapidly. Both OpenAI and Anthropic now use the term formally. Martin Fowler has written about it. An arXiv paper formalizes the pattern. This is not a buzzword — it is the missing architectural layer that determines whether AI agents work in production.

The harness is where reliability lives. Models hallucinate; harnesses catch hallucinations. Models forget; harnesses persist memory. Models don’t know your tools; harnesses expose the right tools at the right time.

Why the Harness Matters More Than the Model

Three reasons.

Models are commoditizing. Harnesses are differentiating. You can swap Sonnet for Opus for Haiku and the harness stays the same. The model is a component. The harness is the product. Claude Code, Claude Cowork, Kiro, and Amazon Quick Desktop all have access to the same models — their differentiation is entirely in the harness.

Two agents using the same model but different harnesses produce wildly different results. Give Claude Sonnet a terminal harness and it writes code. Give the same model a knowledge work harness and it triages your inbox. The model is identical. The experience is not.

The harness is where institutional knowledge lives. Skills, memory, learned preferences, safety policies, workflow patterns — all harness-layer concerns. The model has no concept of “last time this customer asked about pricing, here’s how we responded.” The harness does.

The Convergence

Every harness category is expanding into the others.

Terminal harnesses (Claude Code) are adding knowledge work features — memory, web search, MCP integrations with 150+ tools. Desktop harnesses (Cowork) are adding coding features — file manipulation, structured outputs. IDE harnesses (Kiro, Cursor) are adding agentic loops — autonomous multi-step execution beyond autocomplete. Knowledge work harnesses (Quick Desktop) are adding builder features — agent delegation to Claude Code and Kiro, with git and cloud account access on the roadmap.

The winning harness will unify all four surfaces: terminal, desktop, IDE, and knowledge work — in a single runtime where the model switches modes but the harness provides continuity.

The model conversation is nearly over. The harness conversation is where the actual competition lives.


This is post 4 of 4 in the Agent Primitives series. The companion posts cover what agents are, what skills are, and what tools are — the building blocks that the harness orchestrates into useful work.


Share this post on:


Previous Post
What Is a Tool — The API Call Your Agent Makes on Your Behalf
Next Post
What Is an Agent — And What Isn't