# Builder's Daily — Agent Stack

> Rolling 14-day signal for beat `agent-stack`. Ephemeral context — not evergreen corpus.
> Author: Amit Kumar Agrawal | https://artificialcuriositylabs.ai
> Generated: 2026-06-10
> Human index: https://artificialcuriositylabs.ai/daily/agent-stack/
> RSS: https://artificialcuriositylabs.ai/daily/agent-stack/rss.xml

---

# Agent Stack — June 10, 2026
**URL:** https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-10/
**Beat:** agent-stack
**Date:** 2026-06-10
**Topics:** agent-harness, sandbox, long-running-agents, langchain, open-weights, working-memory
**Summary:** LangSmith Sandboxes goes GA: microVM-based persistent compute for long-running agents; Harness-1: open-weight 20B search agent beats GPT-5.4 on recall v…

## The read

The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.

## What moved

- **LangSmith Sandboxes goes GA: microVM-based persistent compute for long-running agents** — [LangChain Blog](https://www.langchain.com/blog/give-your-ai-agent-its-own-computer)
  LangChain's LangSmith Sandboxes are now generally available: hardware-virtualized microVMs (not containers) that give agents a real filesystem, shell, and package manager with state that persists across sessions. Features include snapshot/fork with copy-on-write, pre-warmed 'blueprints' that cut boot time from minutes to seconds, authenticated service URLs for agent-spun-up local services, and a network auth proxy that injects credentials without exposing them to the sandbox runtime. Accessed via the LangSmith SDK (client.create_sandbox()). **Builder angle:** If your harness needs to run untrusted, model-generated code across multi-step sessions, LangSmith Sandboxes provide a managed microVM isolation layer with snapshot/fork and credential injection instead of building your own container infrastructure.

- **Harness-1: open-weight 20B search agent beats GPT-5.4 on recall via externalized working memory** — [VentureBeat](https://venturebeat.com/orchestration/researchers-trained-an-open-source-ai-search-agent-harness-1-that-outperforms-gpt-5-4-on-recalling-relevant-information)
  Researchers from UIUC, UC Berkeley, and Chroma released Harness-1, a 20B-parameter open-source search agent scoring 73% vs. GPT-5.4's 70.9% on information-recall benchmarks. The gain comes from an external state-management system that keeps candidate documents separate from verified evidence, rather than relying on the model's context window. Training used only 899 SFT trajectories plus 3,453 RL queries; weights are released under Apache 2.0 on Hugging Face. **Builder angle:** The externalized working-memory pattern - storing candidate vs. verified evidence in separate state outside the context window - is a harness architecture you can reuse for any long-horizon retrieval agent, and the Apache-2.0 weights mean the 20B model can be run or fine-tuned directly.

- **Show HN: mcp-gateway-scan, a read-only static scanner for MCP gateway production-readiness, ships v0.1.0** — [GitHub (willianpinho/mcp-gateway-scan)](https://github.com/willianpinho/mcp-gateway-scan)
  New open-source CLI scans MCP gateway code across seven dimensions — RBAC/policy enforcement, fail-close error handling, supply-chain pinning, OpenTelemetry/log redaction, routing and cost controls (max_tokens, rate limits), secrets/credential-manager usage, and kill switches/feature flags — and outputs a color-coded readiness report or JSON. It's read-only: no code execution, no network calls, no secret values printed. **Builder angle:** Drop this into CI before deploying an MCP gateway to catch fail-open error handlers, unpinned base images, and missing kill switches that turn a demo gateway into an incident.

## Also tracking

- **Microsoft redefines Dataverse MCP Server's tool surface as 13 named tools with explicit-approval gates on destructive ops** — [source](https://www.microsoft.com/en-us/power-platform/blog/2026/06/08/dataverse-mcp-server-understanding-the-new-tool-shape/) — A concrete reference for shaping an MCP server's tool surface around a small, named set of operations with built-in human-in-the-loop approval gates on irreversible actions, rather than exposing a generic data-access connector.
- **Nexla launches MCP Studio: conversational generator for governed, schema-aware MCP servers across 600+ enterprise systems** — [source](https://nexla.com/?page_id=30545) — Targets the per-connector governance burden of hand-building MCP servers — worth evaluating if you need scoped, schema-aware MCP tool access across many internal systems without writing auth/permission mapping per source.


---

# Agent Stack — June 9, 2026
**URL:** https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-09/
**Beat:** agent-stack
**Date:** 2026-06-09
**Topics:** google, adk, a2a, workflow, hitl, sdk-release
**Summary:** Google ADK v2.2.0 ships Workflow-to-A2A conversion, HITL state distinction, and request_input clarification tool; Pydantic AI v1.105.0 adds on-demand de…

## The read

The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.

## What moved

- **Google ADK v2.2.0 ships Workflow-to-A2A conversion, HITL state distinction, and request_input clarification tool** — [Google ADK Python GitHub](https://github.com/google/adk-python/releases/tag/v2.2.0)
  Google Agent Development Kit v2.2.0 (June 4) adds Workflow-to-A2A serialization so ADK-defined multi-agent workflows can be exposed as A2A-compatible service endpoints; distinguishes input-required vs auth-required states in human-in-the-loop flows; preserves transparent config on live session reconnect; and introduces a request_input tool enabling agents to ask for clarification mid-execution rather than failing silently. Default model shifts from gemini-2.5-flash to gemini-3-flash-preview ahead of October 2026 sunset. Two breaking changes: GenAI SDK v2 renames turn-based helpers (convert_contents_to_turns → convert_contents_to_steps). **Builder angle:** Workflow-to-A2A lets you wrap an ADK multi-agent graph as an A2A service endpoint without manual protocol wiring; the input/auth HITL state distinction changes how you gate approval vs authentication interrupts in production agent loops.

- **Pydantic AI v1.105.0 adds on-demand deferred tool loading; v1.101.0 brought MCP background tasks and ctx.enqueue** — [Pydantic AI GitHub](https://github.com/pydantic/pydantic-ai/releases/tag/v1.105.0)
  Pydantic AI v1.105.0 (June 2) introduces on-demand deferred loading for tools, instructions, model settings, and hooks — schemas are serialized only on first invocation, cutting cold-start overhead for harnesses with large tool registries. Companion v1.101.0 (May 21) added MCP background task support for non-blocking long-lived tool calls and ctx.enqueue/agent_run.enqueue for a pending message queue across agent turns. v2.0.0b6 (June 5) mirrors all v1 changes in the ongoing v2 breaking-change beta. **Builder angle:** Deferred loading removes schema serialization cost at agent init; MCP background tasks change the execution model for any MCP tool that runs for more than a few seconds without blocking the main agent loop.

- **Azure DevOps remote MCP server enters public preview with Streamable HTTP and Entra auth** — [Microsoft Learn (Azure DevOps docs)](https://learn.microsoft.com/en-us/azure/devops/mcp-server/remote-mcp-server?view=azure-devops)
  Microsoft shipped a hosted Azure DevOps MCP server in public preview that runs over Streamable HTTP transport with Microsoft Entra ID (OAuth) authentication — no local Node.js install required. Agents connect via a single mcp.json URL (https://mcp.dev.azure.com/{org}). Tool exposure is scoped at the request level via X-MCP-Toolsets headers (repos, wit, pipelines, wiki) and read-only mode is enforced via X-MCP-Readonly. Currently limited to VS Code and Visual Studio; other clients (Claude Desktop, Cursor, Codex) are blocked pending Entra OAuth dynamic client registration support. **Builder angle:** First Microsoft-hosted MCP server that demonstrates the Streamable HTTP + Entra auth migration pattern from stdio+PAT — changes how agents connect to ADO without local daemon setup.

## Also tracking

- **IETF individual draft names 'Protocol Pivoting' as MCP-specific lateral-movement attack class** — [source](https://datatracker.ietf.org/doc/draft-mohiuddin-mcp-security-considerations/) — Protocol Pivoting formalizes the cross-server exploit chain that MCP gateway operators must defend against when chaining multiple backend servers in an agent stack.
- **Pega announces every Pega application becomes an MCP server in Infinity 26 (Q3 2026)** — [source](https://www.stocktitan.net/news/PEGA/pega-powers-ai-agents-to-reliably-drive-mission-critical-8if0zhahmsob.html) — Pega's workflow runtime joins the MCP server ecosystem as an enterprise-governed execution surface, letting agents delegate mission-critical process steps rather than model them from scratch.


---

# Agent Stack — June 8, 2026
**URL:** https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-08/
**Beat:** agent-stack
**Date:** 2026-06-08
**Topics:** AWS, Bedrock AgentCore, agent runtime, debugging, API, Microsoft
**Summary:** Amazon Bedrock AgentCore Runtime adds interactive shells for live terminal access into agent sessions; Microsoft Foundry ships production memory stack f…

## The read

The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.

## What moved

- **Amazon Bedrock AgentCore Runtime adds interactive shells for live terminal access into agent sessions** — [AWS What's New](https://aws.amazon.com/about-aws/whats-new/2026/06/amazon-bedrock-agentcore-runtime/)
  AWS shipped a new InvokeAgentRuntimeCommandShell API that opens a persistent, PTY-backed terminal over WebSocket directly into a running agent's microVM — preserving env vars, working directory, and history across reconnects (up to 10 concurrent shells per runtime, sessions resumable via shell ID). It complements the existing stateless InvokeAgentRuntimeCommand for one-shot calls. **Builder angle:** Gives builders an SSH-like debug path into live agent runtimes (inspecting generated files, checking package versions, completing device-code logins) without redeploying or instrumenting the agent.

- **Microsoft Foundry ships production memory stack for agents: procedural memory, TTL, CRUD UI, multimodal recall** — [Microsoft Foundry Blog](https://devblogs.microsoft.com/foundry/memory-build2026/)
  Foundry's Build 2026 memory update adds procedural memory that captures and reuses successful action sequences (~5% gain on STATE-Bench/Tau-Bench), a portal UI for direct CRUD on stored memories, configurable time-to-live to auto-retire low-value entries, multimodal (image) memory, and explicit remember/forget commands. Agent Framework also gains a local file-based MemoryFileStore/MemoryContextProvider pattern for inspectable markdown memory before scaling to managed stores. **Builder angle:** Turns agent memory from an opaque black box into something you can inspect, version, cap with TTL, and unit-test locally before promoting to a managed store — directly changes how memory gets debugged and governed in production harnesses.

- **Azure SRE Agent launches Plugin Marketplace with git-commit-pinned, hash-verified skill installs** — [Microsoft Learn / Azure Docs](https://learn.microsoft.com/en-us/azure/sre-agent/plugin-marketplace)
  Azure SRE Agent now lets teams register curated GitHub-hosted marketplaces (via marketplace.json manifests, including the official Azure SRE Agent Plugins and Anthropic's Claude Plugins repos) and install bundled skills + MCP server configs. Each install pins to an exact git commit, with one-click update checks via SHA-256 hash comparison, recorded provenance (source/version/hash), and supports private repos/GitHub Enterprise with shared per-marketplace credentials. **Builder angle:** Makes skill distribution reproducible and auditable — version pinning plus hash diffing means upstream changes can't silently alter agent behavior, cutting per-skill setup from ~10-15 minutes to ~30 seconds.

## Also tracking

- **NetFoundry launches zero-trust MCP and LLM gateways with no shared API keys** — [source](https://www.prnewswire.com/news-releases/netfoundry-launches-enterprise-class-mcp-and-llm-gateways-bringing-zero-trust-to-ai-deployments-302789053.html) — Replaces runtime allow/deny checks with registry-level tool removal and identity-based (not key-based) agent auth — a different default for teams wiring agents to MCP servers in regulated or air-gapped environments.
- **Critical RCE in Flowise lets attackers hijack MCP stdio transport config to run OS commands** — [source](https://www.penligent.ai/hackinglabs/cve-2026-40933/) — A concrete deployment blocker for anyone wiring stdio-transport MCP servers into agent platforms — validate/sandbox subprocess launch configs rather than trusting schema checks alone, and patch Flowise to 3.1.0 immediately.


---

# Agent Stack — June 7, 2026
**URL:** https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-07/
**Beat:** agent-stack
**Date:** 2026-06-07
**Topics:** langchain, sdk, middleware, harness, create_agent, production
**Summary:** LangChain ships `create_agent` primitive with composable middleware for production harnesses; Google Managed Agents API provisions remote sandbox and st…

## The read

The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.

## What moved

- **LangChain ships `create_agent` primitive with composable middleware for production harnesses** — [LangChain Blog](https://www.langchain.com/blog/how-to-build-a-custom-agent-harness)
  LangChain published `create_agent`, a minimalist three-parameter primitive (model, tools, system prompt) that exposes a middleware layer as the main customization surface. Middleware slots cover context-overflow summarization, filesystem/memory persistence, shell and code-interpreter access, retry and fallback logic, PII/policy enforcement, and human-in-the-loop approval gates. The design lets teams build these production behaviors once and reuse them across multiple agents. **Builder angle:** Replaces ad-hoc harness scaffolding with a composable middleware stack you can test and reuse — directly changes how you wire context management, approvals, and retry into any agent loop.

- **Google Managed Agents API provisions remote sandbox and stateful harness via single REST call** — [Google Cloud Blog](https://cloud.google.com/blog/products/ai-machine-learning/what-google-cloud-announced-in-ai-this-month)
  Announced at Google I/O 2026, the Gemini Enterprise Agent Platform Managed Agents API (pre-GA) lets developers POST to the Interactions endpoint to provision a Google-hosted remote sandbox and agent harness in one call. An `environment_id` parameter reuses a persistent container — preserving libraries, scripts, files, and state — across multi-turn runs; `previous_interaction_id` continues conversation history. Agents can execute code, manage files, and call backend systems without the developer managing underlying compute or security. **Builder angle:** Offloads sandbox lifecycle and state management to Google infrastructure — you get a durable, multi-turn agent execution environment without running your own harness server.

- **NVIDIA NemoClaw open blueprint ships OpenShell secure runtime for long-running industrial agents** — [NVIDIA Blog](https://blogs.nvidia.com/blog/industrial-software-leaders-secure-autonomous-ai-engineers-nemoclaw/)
  NVIDIA published NemoClaw, an open blueprint for building specialized, long-running agents that combines a choice of orchestration harness (OpenClaw or Hermes), a model router, and NeMo customization libraries. The open-source OpenShell runtime core governs per-agent access to files, networks, and tools with policy-based security at every layer. Early industrial adopters include Cadence, Dassault Systèmes, Siemens, and Synopsys, compressing weeks of simulation workflows into hours. **Builder angle:** NemoClaw's pluggable harness + OpenShell security layer provides a concrete reference architecture for domain-specific long-running agents where tool access must be policy-governed.

## Also tracking

- **Glean launches enterprise MCP Gateway with precomputed knowledge-graph context layer** — [source](https://www.glean.com/blog/introducing-glean-mcp-gateway) — Routing agent tool calls through a precomputed knowledge graph gateway reduces context tokens ~30% and offloads permission enforcement to the connector layer—eliminating per-source OAuth wiring at agent build time.
- **Noma releases Agent Access Control with real-time MCP server registry and per-tool 3-state gating** — [source](https://www.helpnetsecurity.com/2026/06/02/noma-brings-visibility-and-access-governance-to-ai-agents-and-mcp-servers/) — Per-tool 3-state gating scopes agent permissions to individual MCP operations rather than granting or denying entire server access—enabling narrow least-privilege without manual per-connection policies.


---

# Agent Stack — June 6, 2026
**URL:** https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-06/
**Beat:** agent-stack
**Date:** 2026-06-06
**Topics:** agent-harness, orchestration, microsoft, codeact, harness, runtime
**Summary:** Microsoft Agent Framework ships Agent Harness, CodeAct, and Handoff orchestration at BUILD 2026; NVIDIA releases NemoClaw orchestration blueprints and O…

## The read

The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.

## What moved

- **Microsoft Agent Framework ships Agent Harness, CodeAct, and Handoff orchestration at BUILD 2026** — [Microsoft Agent Framework Blog](https://devblogs.microsoft.com/agent-framework/microsoft-agent-framework-at-build-2026-announce/)
  MAF (1.0 GA April 2026) adds Agent Harness via AsHarnessAgent() with automatic context compaction, filesystem memory, todo tracking, plan/execute modes, AgentSkillsProvider, BackgroundAgentsProvider, shell execution (.NET), ToolApprovalAgent, and OpenTelemetry. Foundry Hosted Agents deploy local MAF agents as containers with scale-to-zero, per-session VM isolation, and persistent filesystem. CodeAct (alpha) runs multi-tool Python in Hyperlight micro-VMs, cutting benchmark latency 52% and tokens 64%. HandoffBuilder adds directed multi-agent routing with developer-defined topology and guardrails. **Builder angle:** One method turns a chat client into a production harness with compaction, skills, sub-agents, and hosted deployment — collapsing what teams typically stitch from separate OSS pieces.

- **NVIDIA releases NemoClaw orchestration blueprints and OpenShell secure agent runtime** — [NVIDIA Newsroom](https://nvidianews.nvidia.com/news/enterprise-software-leaders-build-ai-agents-with-nvidia)
  NVIDIA Agent Toolkit ships NemoClaw blueprints (available now) connecting popular harnesses for long-running agents, plus OpenShell early preview for policy/privacy controls and routing queries to local vs cloud models. Nemotron 3 Ultra (550B MoE) targets agent harnesses including LangChain Deep Agents, OpenHands, and OpenCode. CUDA-X libraries (cuDF, cuOpt, NeMo, PhysicsNeMo, CUDA-Q) are exposed as domain-specific agent skills. Microsoft partners on Windows security primitives plus OpenShell; Canonical and Red Hat integrate OpenShell into Ubuntu and Red Hat AI. **Builder angle:** NemoClaw plugs orchestration blueprints into existing harnesses while OpenShell adds a policy-controlled runtime layer for cross-platform agent deployment.

- **Amazon Agent-Ops multi-agent framework automates SOPs with 85–97% accuracy in production** — [Amazon Science](https://www.amazon.science/publications/agent-ops-a-multi-agent-orchestration-framework-for-end-to-end-sop-automation-in-e-commerce-operations)
  Agent-Ops orchestrates three agents for e-commerce SOP automation: SOP Groomer converts ambiguous docs into automation-ready specs, WebAgent hits 91.3% task completion via demonstration-based learning on dynamic web UIs, and Document Verification Agent validates invoices and certificates at 94.2% accuracy across languages. Deployed across seven SOP categories in three regions with 83% case-resolution time reduction, used by 100 account managers. **Builder angle:** Demonstrates a supervisor plus specialized-worker orchestration pattern with measurable production accuracy on incomplete SOPs and unpredictable UIs — not just lab demos.

## Also tracking

- **MCP 2026-07-28 release candidate makes Streamable HTTP stateless** — [source](https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/) — Agents behind gateways can drop sticky sessions and route MCP calls on HTTP headers instead of parsing JSON-RPC bodies.
- **AWS documents OAuth code flow for AgentCore Gateway MCP inbound auth** — [source](https://aws.amazon.com/blogs/machine-learning/building-a-secure-auth-code-flow-setup-using-agentcore-gateway-with-mcp-clients/) — Production MCP gateways can enforce per-user IdP tokens at the routing layer before any tool invocation reaches backend servers.