The MCP Proxy, Running

Part four of the MCP series. Part one covers the token tax. Part two covers the pattern. Part three covers the build. This one shows it running.

The proxy is live. I’ve been running it for a few sessions now and wanted to document what it actually looks like from the inside — not the architecture description, not the build log, but the moment-to-moment behavior.

What’s Connected

My current setup has 6 MCP servers behind the proxy. The tool count at last sync: 206 tools total. The breakdown roughly: CRM, messaging, email and calendar, pricing and billing APIs, documentation search, developer tooling (build system, ops, CI/CD), business planning tools, and a browser.

Under the old model — load everything — that’s ~217,000 tokens consumed at session start. Before I type anything.

With the proxy: ~2,100 tokens. The two meta-tools (discover_tools, call_tool) plus the always-on set.

One detail I didn’t expect: the agent knows the total tool count before it calls anything. The proxy’s initialize response includes a server description string that reads: “Progressive disclosure proxy. 206 tools available. Use discover_tools(query) to surface any tool.” The agent has a map before it has territory — it knows there are 206 tools, it just doesn’t know what they are yet. This turns out to matter: the model is less likely to assume a capability doesn’t exist if it knows there are 200 tools it hasn’t seen.

The Discovery Loop, Live

Here’s what just happened in this session. I asked the agent to demonstrate the proxy in action, specifically to find opportunities in my CRM.

First call — discover_tools("find my CRM pipeline opportunities"):

The proxy searched its BM25 index and returned three tools — but they were write tools: tools for adding line items, contact roles, and team members to opportunities. Not what I wanted.

That’s the search working correctly, not failing. Those tools matched “opportunities” but not the intent. The query needed to be more specific.

Second call — discover_tools("search opportunities owned by me"):

Better results, but the search tool itself didn’t appear — because it was already in context. It’s in the always-on set. High-frequency CRM tools load unconditionally. The proxy already knew I’d need it.

Direct execution — the search tool called directly with an ownership filter:

A full list of open opportunities returned — names, amounts, close dates, stage history. Live data from a live system.

The loop worked. One session, real data, no schema overhead at startup.

What the Model Sees at Session Start

The tool context the agent opens with looks roughly like this:

discover_tools(query: string) → list of matching tool schemas
  Search for tools relevant to a task. Returns schemas and ready-to-use
  call templates for the top matching tools.

call_tool(tool_name: string, arguments: object) → tool result
  Execute any available tool by name. Routes to the correct server.

[+ ~10 always-on tools: email_inbox, email_read, calendar_view,
   get_messages, post_message, crm_search_accounts, crm_search_records,
   web_search, ...]

That’s it. The 206 tools behind the proxy are available on demand — not in context until needed.

One thing the architecture description undersells: always-on tools are directly callable by name, exactly like regular MCP tools. The blog post framing — discover_tools then call_tool — implies everything goes through that two-step flow. It doesn’t. email_inbox called directly routes transparently through the proxy to the right server. From the model’s perspective, the always-on tools are indistinguishable from a normal MCP session. The meta-tool interface only appears when the model reaches for something outside the always-on set.

The Recipe Format in Practice

One thing I noted from the implementation post: discover_tools returns usage recipes, not raw JSON schemas. Here’s what a result looks like:

list_filter_options
Get valid values for a filter attribute on a resource type.

Ready to call with call_tool:
{
  "tool_name": "list_filter_options",
  "arguments": {
    "resource_type": "<resource_type>",
    "attribute_names": []
  }
}
# Optional: filters

The agent copies the template, fills the placeholders, executes. No schema parsing, no type inference. The result: tool calls that go right the first time, with no intermediate reasoning step about what fields are required and in what format.

The error path is also working. Unknown tool names return a suggestion: Unknown tool: 'log_activity'. Use discover_tools("log activity") to find the right tool name. The model reads the error, calls discovery, continues. No human intervention.

The Discovery Hit Rate Question

After running the proxy across enough sessions to have real signal: most sessions never call discover_tools at all.

The always-on set has converged around five categories: messaging (send and read), email (inbox and read), calendar, web search, and two or three CRM lookups. That’s roughly 10 tools. They cover the vast majority of what a knowledge-work session actually touches. Discovery only fires when I reach for something outside that core — filing a specific record type, pulling pricing data, running a build — and even then it’s one call, not a recurring pattern.

The calibration heuristic that emerged: if you reach for a tool in more than half your sessions, it belongs in the always-on set. If you reach for it occasionally, let discovery handle it. If you’ve never reached for it, it just lives in the index. The 10-tool threshold isn’t magic — it’s roughly where the schema cost of the always-on set stops being negligible. Above ~15 tools, you start eating into the savings the proxy exists to provide.

The Number That Matters

Not the cost reduction — though 217,000 tokens down to 2,100 at session start is real, and at $3/million input tokens it compounds.

The number that matters: context window headroom.

A 200k-token context window that burns 217k tokens at session start has zero room for anything. Conversation history, retrieved documents, intermediate reasoning, tool results — all of it has to fight over the scraps, and eventually the session auto-summarizes, collapsing the conversation into a digest to make room. The proxy doesn’t prevent auto-summarization. But starting at 2,100 tokens means a focused work session — tool calls, retrieved context, a chain of reasoning — typically completes without the context window ever becoming the constraint.

That’s the difference in practice. The session doesn’t end. The context doesn’t collapse. The work continues.

The MCP series so far: token tax, the pattern, the build, and this post. If you’re running something similar and have measured your discovery hit rate at scale, I’d be curious about the numbers.