Skip to content
Go back

Builder's Daily / Agent Stack

Agent Stack — June 10, 2026

How do agents run, use tools, and delegate to each other?

The read

The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.

What moved

  • LangSmith Sandboxes goes GA: microVM-based persistent compute for long-running agentsLangChain Blog LangChain’s LangSmith Sandboxes are now generally available: hardware-virtualized microVMs (not containers) that give agents a real filesystem, shell, and package manager with state that persists across sessions. Features include snapshot/fork with copy-on-write, pre-warmed ‘blueprints’ that cut boot time from minutes to seconds, authenticated service URLs for agent-spun-up local services, and a network auth proxy that injects credentials without exposing them to the sandbox runtime. Accessed via the LangSmith SDK (client.create_sandbox()). Builder angle: If your harness needs to run untrusted, model-generated code across multi-step sessions, LangSmith Sandboxes provide a managed microVM isolation layer with snapshot/fork and credential injection instead of building your own container infrastructure.

  • Harness-1: open-weight 20B search agent beats GPT-5.4 on recall via externalized working memoryVentureBeat Researchers from UIUC, UC Berkeley, and Chroma released Harness-1, a 20B-parameter open-source search agent scoring 73% vs. GPT-5.4’s 70.9% on information-recall benchmarks. The gain comes from an external state-management system that keeps candidate documents separate from verified evidence, rather than relying on the model’s context window. Training used only 899 SFT trajectories plus 3,453 RL queries; weights are released under Apache 2.0 on Hugging Face. Builder angle: The externalized working-memory pattern - storing candidate vs. verified evidence in separate state outside the context window - is a harness architecture you can reuse for any long-horizon retrieval agent, and the Apache-2.0 weights mean the 20B model can be run or fine-tuned directly.

  • Show HN: mcp-gateway-scan, a read-only static scanner for MCP gateway production-readiness, ships v0.1.0GitHub (willianpinho/mcp-gateway-scan) New open-source CLI scans MCP gateway code across seven dimensions — RBAC/policy enforcement, fail-close error handling, supply-chain pinning, OpenTelemetry/log redaction, routing and cost controls (max_tokens, rate limits), secrets/credential-manager usage, and kill switches/feature flags — and outputs a color-coded readiness report or JSON. It’s read-only: no code execution, no network calls, no secret values printed. Builder angle: Drop this into CI before deploying an MCP gateway to catch fail-open error handlers, unpinned base images, and missing kill switches that turn a demo gateway into an incident.

Also tracking

  • Microsoft redefines Dataverse MCP Server’s tool surface as 13 named tools with explicit-approval gates on destructive opssource — A concrete reference for shaping an MCP server’s tool surface around a small, named set of operations with built-in human-in-the-loop approval gates on irreversible actions, rather than exposing a generic data-access connector.
  • Nexla launches MCP Studio: conversational generator for governed, schema-aware MCP servers across 600+ enterprise systemssource — Targets the per-connector governance burden of hand-building MCP servers — worth evaluating if you need scoped, schema-aware MCP tool access across many internal systems without writing auth/permission mapping per source.
Share this post on: