<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Builder&apos;s Daily</title><description>Curated signal for builders — agents, inference, AI coding, and the physical stack beneath AI.</description><link>https://artificialcuriositylabs.ai/</link><language>en-us</language><atom:link href="https://artificialcuriositylabs.ai/daily/rss.xml" rel="self" type="application/rss+xml"/><item><title>Agent Commerce — June 10, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-commerce/2026-06-10/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-commerce/2026-06-10/</guid><description>Rain launches Agent Control Layer for programmatic agent spending guardrails; Santander&apos;s Getnet enables Mastercard Agent Pay acceptance, completes firs…</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Agents that can pay change commerce architecture. Protocol moves (x402, SPT, checkout rails) are infrastructure — the moat is trust, scoped authorization, and human-defined intent boundaries.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Rain launches Agent Control Layer for programmatic agent spending guardrails&lt;/strong&gt; — &lt;a href=&quot;https://www.prnewswire.com/news-releases/rain-releases-agent-control-layer-bringing-programmatic-spending-guardrails-to-agentic-payments-302794541.html&quot;&gt;PR Newswire / Rain&lt;/a&gt;
Rain, a Visa and Mastercard principal member, released an Agent Control Layer letting businesses set programmatic spending limits for AI agents: MCC allowlists, approved-merchant restrictions, amount/frequency caps, and active-card limits on virtual cards, plus counterparty/amount/timing controls on virtual accounts, onramps/offramps, and stablecoin/fiat transfers. Controls are enforced at card issuance and transaction initiation rather than after the fact. YC-backed Sponge already uses it to issue stablecoin-backed virtual cards for agent purchases at Visa-accepting merchants. &lt;strong&gt;Builder angle:&lt;/strong&gt; Gives builders a concrete authorization layer — spend limits enforced at issuance/initiation across cards, stablecoins, and fiat rails — for agents that need to transact without per-purchase human approval.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Santander&apos;s Getnet enables Mastercard Agent Pay acceptance, completes first agent-initiated payment in Latin America&lt;/strong&gt; — &lt;a href=&quot;https://letsdatascience.com/news/getnet-enables-merchants-to-accept-agent-initiated-payments-ca2bccb0&quot;&gt;Getnet (Santander)&lt;/a&gt;
Getnet, Santander&apos;s merchant payments platform, announced protocol-agnostic infrastructure for accepting AI agent-initiated payments, compatible with Mastercard Agent Pay and integrating with Visa Intelligent Commerce. Getnet, Mastercard, and Mexican fintech Neivor completed what Santander calls the first real-world agent-initiated payment in Mexico and Latin America, following a March 2026 Santander-Mastercard transaction described as Europe&apos;s first regulated end-to-end AI agent payment. &lt;strong&gt;Builder angle:&lt;/strong&gt; A live merchant-acquirer pilot shows Mastercard Agent Pay&apos;s authenticated-intent flow working end-to-end for agent checkout, giving builders a reference deployment pattern beyond paper pilots.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Chainalysis: x402 stablecoin agent payments cross 100M transactions on Base, shift from micro-tx to real value transfer&lt;/strong&gt; — &lt;a href=&quot;https://www.chainalysis.com/blog/x402-agentic-payments-adoption/&quot;&gt;Chainalysis&lt;/a&gt;
Chainalysis reports x402 — the HTTP 402-based protocol where an agent receives a payment spec, executes a stablecoin micro-payment on-chain, and resubmits the request with a receipt — has driven over 100M transactions on Base since mid-2025. The share of transactions over $1 rose from 49% to 95% as sub-$1 micro-transactions collapsed from 46% to 4%, and tester-to-payer conversion improved 4x, indicating a shift from speculative/meme activity toward functional machine-to-machine payments. &lt;strong&gt;Builder angle:&lt;/strong&gt; x402&apos;s pay-per-request flow is showing real usage growth beyond memecoin noise, making it a more credible default for metering paid API/MCP tool calls with on-chain stablecoin settlement.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Wirex joins Visa&apos;s Agentic Ready programme to test agent-initiated payments&lt;/strong&gt; — &lt;a href=&quot;https://www.wirexapp.com/post/wirex-joins-visa-agentic-ready-programme-to-enable-ai-driven-payments&quot;&gt;source&lt;/a&gt; — Wirex is exploring consent-controlled agent-initiated payments for SaaS, procurement, and travel via Visa&apos;s Agentic Ready program — early-stage exploration, not a shipped feature.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PayPal-backed Hey Savi launches UK agentic shopping app with native in-app checkout&lt;/strong&gt; — &lt;a href=&quot;https://www.finextra.com/newsarticle/47887/hey-savi-and-paypal-launch-agentic-commerce-platform-with-in-app-checkout&quot;&gt;source&lt;/a&gt; — Hey Savi&apos;s AI fashion-search app completes purchases via embedded PayPal checkout (Debenhams Group as launch retailer), without the protocol/token detail needed for the main list.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-commerce</category><category>spending-controls</category><category>virtual-cards</category><category>stablecoin</category><category>Visa</category><category>Mastercard</category><category>Mastercard Agent Pay</category></item><item><title>Agent Reliability — June 10, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-10/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-10/</guid><description>Perplexity launches &apos;Search as Code&apos;: agents write Python to compose retrieval, rerank, and dedup primitives directly; LlamaParse adds word/line/cell-le…</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;You cannot run what you cannot see. Grounding is institutional context encoded in retrieval; observability is electricity-metering for agent loops; security moves from policy decks to runtime guardrails. Reliability is the stack between &quot;it works in demo&quot; and &quot;it runs in prod.&quot;&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Perplexity launches &apos;Search as Code&apos;: agents write Python to compose retrieval, rerank, and dedup primitives directly&lt;/strong&gt; — &lt;a href=&quot;https://research.perplexity.ai/articles/rethinking-search-as-code-generation&quot;&gt;Perplexity Research&lt;/a&gt;
Perplexity replaced its sequential function-calling search loop with &apos;Search as Code&apos; (SaC): models generate task-specific Python that runs in sandboxes and calls an Agentic Search SDK exposing atomic primitives (retrieval, ranking, filtering, deduplication). On a CVE-advisory task this cut token usage 85% (288.7K to 42.9K tokens), and SaC scored +29% on DSQA and +45% on a new WANDR benchmark, with medium-reasoning SaC beating all non-SaC systems at under $1/task. Rolling out now in Perplexity Computer and the Agent API. &lt;strong&gt;Builder angle:&lt;/strong&gt; Builders get composable, code-level retrieval/rerank/dedup primitives instead of fixed search endpoints, enabling per-task retrieval strategies at a fraction of the token cost of loop-based agentic search.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LlamaParse adds word/line/cell-level bounding boxes for audit-grade citation grounding&lt;/strong&gt; — &lt;a href=&quot;https://www.llamaindex.ai/blog/announcing-granular-bounding-boxes-in-llamaparse&quot;&gt;LlamaIndex Blog&lt;/a&gt;
LlamaParse now supports an opt-in &lt;code&gt;output_options.granular_bboxes&lt;/code&gt; parameter to return word-, line-, or cell-level coordinates instead of coarse layout-level boxes. The system applies coordinates only to text explicitly present on the page (not inferred values or AI summaries), enabling exact-location citations for dense documents like financial filings and tables. Available across paid tiers, with Agentic Plus adding extra verification passes. &lt;strong&gt;Builder angle:&lt;/strong&gt; RAG pipelines can now ground citations to a specific word or table cell rather than highlighting a whole page or paragraph, closing a gap for compliance and financial-document agents that need audit-grade provenance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Arize: Microsoft&apos;s open trust stack makes OpenInference the shared trace contract linking ASSERT evals, ACS runtime controls, and Phoenix/Arize AX&lt;/strong&gt; — &lt;a href=&quot;https://arize.com/blog/microsoft-open-trust-stack-openinference/&quot;&gt;Arize Blog&lt;/a&gt;
At Build 2026 Microsoft introduced ASSERT (MIT-licensed, spec-driven agent evaluation and regression-testing framework that turns behavior specs into test cases and graded traces) and Agent Control Specification (ACS), a portable runtime-guardrail standard with checkpoints at input, LLM call, state, tool execution, and output. Both standardize on OpenInference, the OpenTelemetry-for-AI standard Arize created (33+ framework integrations, two-line instrumentation): ASSERT reads OpenInference spans as judge evidence, ACS emits its control decisions as spans, and the same trace stream feeds Phoenix or Arize AX for production monitoring. &lt;strong&gt;Builder angle:&lt;/strong&gt; One OpenInference instrumentation pass now feeds CI eval gates (ASSERT), runtime guardrails (ACS), and production observability (Phoenix/Arize AX) without separate re-instrumentation per tool.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Sedai launches autonomous AI Agent Optimization platform with real-time per-team/per-model cost attribution and AI-judge-based routing&lt;/strong&gt; — &lt;a href=&quot;https://cioinfluence.com/machine-learning/sedai-launches-the-first-autonomous-platform-for-ai-agent-optimization/&quot;&gt;source&lt;/a&gt; — Drop-in layer for per-team/per-model token-cost attribution and automated cost-aware model routing across providers without re-instrumenting agent code.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zscaler launches AI Broker, AI Access Graph, and Endpoint AI Security to govern agent identity and MCP/A2A traffic&lt;/strong&gt; — &lt;a href=&quot;https://www.zscaler.com/press/zscaler-unveils-new-product-innovations-secure-agentic-ai&quot;&gt;source&lt;/a&gt; — Gives a concrete pattern for scoping which MCP/A2A tools an agent can reach per identity and tracking data lineage in real time — a deployable access-control and audit layer for agent fleets.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-reliability</category><category>retrieval-architecture</category><category>agentic-search</category><category>reranking</category><category>cost</category><category>ingestion</category><category>parsing</category></item><item><title>Agent Stack — June 10, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-10/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-10/</guid><description>LangSmith Sandboxes goes GA: microVM-based persistent compute for long-running agents; Harness-1: open-weight 20B search agent beats GPT-5.4 on recall v…</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LangSmith Sandboxes goes GA: microVM-based persistent compute for long-running agents&lt;/strong&gt; — &lt;a href=&quot;https://www.langchain.com/blog/give-your-ai-agent-its-own-computer&quot;&gt;LangChain Blog&lt;/a&gt;
LangChain&apos;s LangSmith Sandboxes are now generally available: hardware-virtualized microVMs (not containers) that give agents a real filesystem, shell, and package manager with state that persists across sessions. Features include snapshot/fork with copy-on-write, pre-warmed &apos;blueprints&apos; that cut boot time from minutes to seconds, authenticated service URLs for agent-spun-up local services, and a network auth proxy that injects credentials without exposing them to the sandbox runtime. Accessed via the LangSmith SDK (client.create_sandbox()). &lt;strong&gt;Builder angle:&lt;/strong&gt; If your harness needs to run untrusted, model-generated code across multi-step sessions, LangSmith Sandboxes provide a managed microVM isolation layer with snapshot/fork and credential injection instead of building your own container infrastructure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Harness-1: open-weight 20B search agent beats GPT-5.4 on recall via externalized working memory&lt;/strong&gt; — &lt;a href=&quot;https://venturebeat.com/orchestration/researchers-trained-an-open-source-ai-search-agent-harness-1-that-outperforms-gpt-5-4-on-recalling-relevant-information&quot;&gt;VentureBeat&lt;/a&gt;
Researchers from UIUC, UC Berkeley, and Chroma released Harness-1, a 20B-parameter open-source search agent scoring 73% vs. GPT-5.4&apos;s 70.9% on information-recall benchmarks. The gain comes from an external state-management system that keeps candidate documents separate from verified evidence, rather than relying on the model&apos;s context window. Training used only 899 SFT trajectories plus 3,453 RL queries; weights are released under Apache 2.0 on Hugging Face. &lt;strong&gt;Builder angle:&lt;/strong&gt; The externalized working-memory pattern - storing candidate vs. verified evidence in separate state outside the context window - is a harness architecture you can reuse for any long-horizon retrieval agent, and the Apache-2.0 weights mean the 20B model can be run or fine-tuned directly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Show HN: mcp-gateway-scan, a read-only static scanner for MCP gateway production-readiness, ships v0.1.0&lt;/strong&gt; — &lt;a href=&quot;https://github.com/willianpinho/mcp-gateway-scan&quot;&gt;GitHub (willianpinho/mcp-gateway-scan)&lt;/a&gt;
New open-source CLI scans MCP gateway code across seven dimensions — RBAC/policy enforcement, fail-close error handling, supply-chain pinning, OpenTelemetry/log redaction, routing and cost controls (max_tokens, rate limits), secrets/credential-manager usage, and kill switches/feature flags — and outputs a color-coded readiness report or JSON. It&apos;s read-only: no code execution, no network calls, no secret values printed. &lt;strong&gt;Builder angle:&lt;/strong&gt; Drop this into CI before deploying an MCP gateway to catch fail-open error handlers, unpinned base images, and missing kill switches that turn a demo gateway into an incident.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Microsoft redefines Dataverse MCP Server&apos;s tool surface as 13 named tools with explicit-approval gates on destructive ops&lt;/strong&gt; — &lt;a href=&quot;https://www.microsoft.com/en-us/power-platform/blog/2026/06/08/dataverse-mcp-server-understanding-the-new-tool-shape/&quot;&gt;source&lt;/a&gt; — A concrete reference for shaping an MCP server&apos;s tool surface around a small, named set of operations with built-in human-in-the-loop approval gates on irreversible actions, rather than exposing a generic data-access connector.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Nexla launches MCP Studio: conversational generator for governed, schema-aware MCP servers across 600+ enterprise systems&lt;/strong&gt; — &lt;a href=&quot;https://nexla.com/?page_id=30545&quot;&gt;source&lt;/a&gt; — Targets the per-connector governance burden of hand-building MCP servers — worth evaluating if you need scoped, schema-aware MCP tool access across many internal systems without writing auth/permission mapping per source.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-stack</category><category>agent-harness</category><category>sandbox</category><category>long-running-agents</category><category>langchain</category><category>open-weights</category><category>working-memory</category></item><item><title>AI Platform — June 10, 2026</title><link>https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-10/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-10/</guid><description>DeepSeek V4 pricing triggers China-wide AI API price war — Tencent Cloud cuts DeepSeek-V4 hosting 97.5%, Xiaomi cuts MiMo-V2.5 99%; Google&apos;s GKE Inferen…</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Token price is the new kWh, and the platform you ship on determines how fast you reach production. Jevons says falling inference cost drives more loops and heavier agents — track pricing, routing, and ship infrastructure moves that change what builders can afford to run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DeepSeek V4 pricing triggers China-wide AI API price war — Tencent Cloud cuts DeepSeek-V4 hosting 97.5%, Xiaomi cuts MiMo-V2.5 99%&lt;/strong&gt; — &lt;a href=&quot;https://www.scmp.com/tech/article/3356138/deepseek-v4-forces-rivals-slash-prices-rattling-chinas-cloud-providers&quot;&gt;South China Morning Post&lt;/a&gt;
DeepSeek&apos;s aggressive V4 API pricing has forced Chinese rivals to slash costs. Tencent Cloud cut its hosted DeepSeek-V4 series prices by ~97.5% to fully match DeepSeek&apos;s official rates with no cloud premium (effective June 2). Xiaomi cut MiMo-V2.5 API pricing by up to 99%, which pushed it to #6 on OpenRouter with 1.7 trillion tokens/week processed (up &gt;999% week-over-week). MiniMax instead launched a hybrid token + subscription model for M3, with subscription tiers from $7.24 to $69.28/month. &lt;strong&gt;Builder angle:&lt;/strong&gt; If you route any workload to Chinese-hosted open models, per-token costs for MiMo-V2.5 just dropped ~99% and Tencent Cloud&apos;s DeepSeek-V4 hosting now matches DeepSeek&apos;s direct API price — both are now viable cheap-tier options in cost-based routing tables.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Google&apos;s GKE Inference Gateway cuts time-to-first-token 92.8% via prefix-cache-aware routing&lt;/strong&gt; — &lt;a href=&quot;https://cloud.google.com/blog/products/containers-kubernetes/gke-inference-gateway-prefix-caching-accelerates-ai-inference&quot;&gt;Google Cloud Blog&lt;/a&gt;
An independent Principled Technologies benchmark found GKE Inference Gateway — which caches KV-cache prefixes and routes requests to the pod already holding the matching prefix instead of round-robin — delivered 15.7% higher output token throughput (7,169 vs 6,042 tok/s), 92.8% lower time-to-first-token (188ms vs 2,625ms), and 62.6% lower inter-token latency (30.2ms vs 81ms) versus a managed-Kubernetes baseline serving Llama 3.1 8B on identical 8x A100 hardware. Snap reports 75-80% prefix cache hit rates running this in production via the open-source llm-d stack. &lt;strong&gt;Builder angle:&lt;/strong&gt; Routing on KV-cache locality instead of round-robin is a direct GPU-count and latency lever for RAG/long-system-prompt workloads — same throughput on fewer accelerators, with TTFT improvements that matter for interactive agents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Xiaomi&apos;s MiMo-V2.5-Pro-UltraSpeed hits 1,000+ tok/s on a 1T-parameter model via FP4 + DFlash speculative decoding — 3x price for 10x output&lt;/strong&gt; — &lt;a href=&quot;https://mimo.xiaomi.com/blog/mimo-tilert-1000tps&quot;&gt;Xiaomi MiMo Blog&lt;/a&gt;
Xiaomi and TileRT combined selective FP4 quantization on MoE expert weights with &apos;DFlash&apos; block-level speculative decoding (avg. accepted-token length 6.30 in coding, 5.56 in math, 4.29 in agent tasks) to decode a 1-trillion-parameter model at over 1,000 tok/s (up to ~1,200) on a single standard 8-GPU node — about 10x the standard MiMo-V2.5-Pro throughput. The UltraSpeed API tier is priced at 3x the standard MiMo-V2.5-Pro rate, available via a limited application-based trial running June 9-23, 2026. &lt;strong&gt;Builder angle:&lt;/strong&gt; A 3x price for ~10x tokens/sec changes the cost-per-completed-task math for latency-bound agentic and trading workloads previously bottlenecked on trillion-parameter decode speed — worth benchmarking against smaller models during the trial window.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Salesforce ships Agentforce Mobile SDK to GA and opens ADL Connect API beta for scriptable RAG data libraries&lt;/strong&gt; — &lt;a href=&quot;https://developer.salesforce.com/blogs/2026/06/the-salesforce-developers-guide-to-the-summer-26-release&quot;&gt;source&lt;/a&gt; — RAG grounding data for Salesforce agents can now be created, uploaded, and promoted through a scriptable REST API as part of CI/CD, while the same agent ships into native iOS/Android/React Native apps via a GA SDK.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vercel ships Drives for Sandbox in private beta for persistent storage across disposable AI agent sandboxes&lt;/strong&gt; — &lt;a href=&quot;https://vercel.com/changelog/drives-for-vercel-sandbox-in-private-beta&quot;&gt;source&lt;/a&gt; — Removes the re-provisioning cost of disposable agent sandboxes by letting builders persist a coding agent&apos;s workspace (repo clone, deps, build cache) across runs instead of rebuilding it from scratch each time.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-platform</category><category>pricing</category><category>china</category><category>deepseek</category><category>tencent-cloud</category><category>xiaomi</category><category>routing</category></item><item><title>Builder Loop — June 10, 2026</title><link>https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-10/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-10/</guid><description>Claude Fable 5 reaches general availability across GitHub Copilot surfaces; GitHub extends automatic PR security validation to third-party coding agents…</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Coding agents are bulldozers, not replacements — humans still frame the problem. OSS shifts default stack choices faster than any vendor roadmap. When everyone can generate and fork, differentiation is taste, review, and what you ship before it becomes the default.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Claude Fable 5 reaches general availability across GitHub Copilot surfaces&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-09-claude-fable-5-is-generally-available-for-github-copilot&quot;&gt;GitHub Changelog&lt;/a&gt;
Anthropic&apos;s Claude Fable 5 — described as the first model in Anthropic&apos;s &apos;Mythos&apos; class, built for long-horizon autonomous coding and knowledge-work — is now GA in GitHub Copilot across VS Code, Visual Studio, Copilot CLI, the Copilot cloud agent, github.com, GitHub Mobile, JetBrains, Xcode, and Eclipse, for Pro+, Max, Business, and Enterprise plans. Unlike other Claude models in Copilot, Fable 5 requires 30-day data retention so Anthropic&apos;s safety classifiers can run, and Business/Enterprise admins must explicitly enable its policy. &lt;strong&gt;Builder angle:&lt;/strong&gt; A new long-horizon coding model lands in every Copilot surface at once, but Business/Enterprise teams must opt into a 30-day data-retention policy before they can use it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub extends automatic PR security validation to third-party coding agents like Claude and Codex&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-09-security-validation-for-third-party-coding-agents&quot;&gt;GitHub Changelog&lt;/a&gt;
GitHub now runs its automatic security checks — CodeQL vulnerability analysis, dependency scanning against the GitHub Advisory Database, and secret scanning — on pull requests from any coding agent, not just Copilot&apos;s cloud agent. This covers third-party agents including Claude and OpenAI Codex. Agents attempt to fix flagged issues before finalizing the PR. The checks run by default, follow existing repo Copilot settings, and require no GitHub Advanced Security license. &lt;strong&gt;Builder angle:&lt;/strong&gt; PRs opened by Claude, Codex, or other third-party agents now get the same CodeQL/secret-scan/dependency gate as Copilot&apos;s cloud agent, automatically and with no extra license.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;VibeDrift launches MCP server that feeds codebase-convention signals into Claude Code and Cursor mid-generation&lt;/strong&gt; — &lt;a href=&quot;https://www.vibedrift.ai/blog/does-a-drift-checker-change-agent-output&quot;&gt;VibeDrift Blog&lt;/a&gt;
VibeDrift shipped a paid-tier MCP server that runs inside coding-agent sessions (Claude Code, Cursor) and surfaces codebase-convention signals during code generation, rather than scanning for drift after the fact. The team&apos;s own measurement found that when a project convention conflicts with a model&apos;s default and isn&apos;t already visible in context, the MCP signal measurably reduced introduced drift (95% CI [0.57, 1.11]) — with no effect when conventions matched defaults or were already in-file. &lt;strong&gt;Builder angle:&lt;/strong&gt; One MCP config block can cut convention drift in agent-written code, specifically in the cases where a model&apos;s defaults disagree with house style and that style isn&apos;t already visible in context.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cohere ships North Mini Code, a 30B MoE (3B active) coding model under Apache 2.0&lt;/strong&gt; — &lt;a href=&quot;https://huggingface.co/blog/CohereLabs/introducing-north-mini-code&quot;&gt;source&lt;/a&gt; — Apache 2.0 license plus a small 3B active-parameter footprint and FP8 checkpoints make this a realistic self-host target for terminal coding agents, not just an API-only release.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vercel AI Gateway data: DeepSeek jumped from &amp;#x3C;1% to 17% of token volume in a month, while spend share stayed near 1%&lt;/strong&gt; — &lt;a href=&quot;https://vercel.com/blog/ai-gateway-production-index-june-2026&quot;&gt;source&lt;/a&gt; — Concrete production routing data shows an open-weight model now carrying ~1/6 of gateway token volume at near-zero cost share — a real comparison point for teams deciding which open model to route bulk/cheap traffic to.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>builder-loop</category><category>github-copilot</category><category>anthropic</category><category>model-release</category><category>ide-integration</category><category>github</category><category>security</category></item><item><title>Agent Stack — June 9, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-09/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-09/</guid><description>Google ADK v2.2.0 ships Workflow-to-A2A conversion, HITL state distinction, and request_input clarification tool; Pydantic AI v1.105.0 adds on-demand de…</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Google ADK v2.2.0 ships Workflow-to-A2A conversion, HITL state distinction, and request_input clarification tool&lt;/strong&gt; — &lt;a href=&quot;https://github.com/google/adk-python/releases/tag/v2.2.0&quot;&gt;Google ADK Python GitHub&lt;/a&gt;
Google Agent Development Kit v2.2.0 (June 4) adds Workflow-to-A2A serialization so ADK-defined multi-agent workflows can be exposed as A2A-compatible service endpoints; distinguishes input-required vs auth-required states in human-in-the-loop flows; preserves transparent config on live session reconnect; and introduces a request_input tool enabling agents to ask for clarification mid-execution rather than failing silently. Default model shifts from gemini-2.5-flash to gemini-3-flash-preview ahead of October 2026 sunset. Two breaking changes: GenAI SDK v2 renames turn-based helpers (convert_contents_to_turns → convert_contents_to_steps). &lt;strong&gt;Builder angle:&lt;/strong&gt; Workflow-to-A2A lets you wrap an ADK multi-agent graph as an A2A service endpoint without manual protocol wiring; the input/auth HITL state distinction changes how you gate approval vs authentication interrupts in production agent loops.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pydantic AI v1.105.0 adds on-demand deferred tool loading; v1.101.0 brought MCP background tasks and ctx.enqueue&lt;/strong&gt; — &lt;a href=&quot;https://github.com/pydantic/pydantic-ai/releases/tag/v1.105.0&quot;&gt;Pydantic AI GitHub&lt;/a&gt;
Pydantic AI v1.105.0 (June 2) introduces on-demand deferred loading for tools, instructions, model settings, and hooks — schemas are serialized only on first invocation, cutting cold-start overhead for harnesses with large tool registries. Companion v1.101.0 (May 21) added MCP background task support for non-blocking long-lived tool calls and ctx.enqueue/agent_run.enqueue for a pending message queue across agent turns. v2.0.0b6 (June 5) mirrors all v1 changes in the ongoing v2 breaking-change beta. &lt;strong&gt;Builder angle:&lt;/strong&gt; Deferred loading removes schema serialization cost at agent init; MCP background tasks change the execution model for any MCP tool that runs for more than a few seconds without blocking the main agent loop.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Azure DevOps remote MCP server enters public preview with Streamable HTTP and Entra auth&lt;/strong&gt; — &lt;a href=&quot;https://learn.microsoft.com/en-us/azure/devops/mcp-server/remote-mcp-server?view=azure-devops&quot;&gt;Microsoft Learn (Azure DevOps docs)&lt;/a&gt;
Microsoft shipped a hosted Azure DevOps MCP server in public preview that runs over Streamable HTTP transport with Microsoft Entra ID (OAuth) authentication — no local Node.js install required. Agents connect via a single mcp.json URL (&lt;a href=&quot;https://mcp.dev.azure.com/%7Borg%7D&quot;&gt;https://mcp.dev.azure.com/{org}&lt;/a&gt;). Tool exposure is scoped at the request level via X-MCP-Toolsets headers (repos, wit, pipelines, wiki) and read-only mode is enforced via X-MCP-Readonly. Currently limited to VS Code and Visual Studio; other clients (Claude Desktop, Cursor, Codex) are blocked pending Entra OAuth dynamic client registration support. &lt;strong&gt;Builder angle:&lt;/strong&gt; First Microsoft-hosted MCP server that demonstrates the Streamable HTTP + Entra auth migration pattern from stdio+PAT — changes how agents connect to ADO without local daemon setup.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;IETF individual draft names &apos;Protocol Pivoting&apos; as MCP-specific lateral-movement attack class&lt;/strong&gt; — &lt;a href=&quot;https://datatracker.ietf.org/doc/draft-mohiuddin-mcp-security-considerations/&quot;&gt;source&lt;/a&gt; — Protocol Pivoting formalizes the cross-server exploit chain that MCP gateway operators must defend against when chaining multiple backend servers in an agent stack.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pega announces every Pega application becomes an MCP server in Infinity 26 (Q3 2026)&lt;/strong&gt; — &lt;a href=&quot;https://www.stocktitan.net/news/PEGA/pega-powers-ai-agents-to-reliably-drive-mission-critical-8if0zhahmsob.html&quot;&gt;source&lt;/a&gt; — Pega&apos;s workflow runtime joins the MCP server ecosystem as an enterprise-governed execution surface, letting agents delegate mission-critical process steps rather than model them from scratch.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-stack</category><category>google</category><category>adk</category><category>a2a</category><category>workflow</category><category>hitl</category><category>sdk-release</category></item><item><title>AI Platform — June 9, 2026</title><link>https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-09/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-09/</guid><description>Cerebras positions Kimi K2.6 at 981 tok/s output — 5.4× faster than Gemini 3.5 Flash with half the TTFT; Google Gemini 2.0 Flash permanently shut down J…</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Token price is the new kWh, and the platform you ship on determines how fast you reach production. Jevons says falling inference cost drives more loops and heavier agents — track pricing, routing, and ship infrastructure moves that change what builders can afford to run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cerebras positions Kimi K2.6 at 981 tok/s output — 5.4× faster than Gemini 3.5 Flash with half the TTFT&lt;/strong&gt; — &lt;a href=&quot;https://www.cerebras.ai/blog/which-is-faster-gemini-3-5-flash-or-kimi-k2-6-on-cerebras&quot;&gt;Cerebras Blog&lt;/a&gt;
Cerebras published an Artificial Analysis-backed benchmark showing Kimi K2.6 on Cerebras hardware achieves 981 output tok/s versus Gemini 3.5 Flash&apos;s 181 tok/s (5.4×), TTFT of 452ms vs 960ms, and end-to-end task completion of 5.6s vs 17.5s. Artificial Analysis quality scores are comparable (53.9 vs 55.3). The post emphasizes that Kimi K2.6 is open-weight and fine-tune-able, contrasting with Gemini&apos;s closed API. &lt;strong&gt;Builder angle:&lt;/strong&gt; Builders with latency-sensitive or high-throughput workloads have a concrete routing signal: route open-weight Kimi K2.6 to Cerebras to hit sub-500ms TTFT and ~1000 tok/s — at similar quality to Gemini 3.5 Flash.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Google Gemini 2.0 Flash permanently shut down June 1 — builders must migrate to Gemini 3.x at substantially higher prices&lt;/strong&gt; — &lt;a href=&quot;https://ai.google.dev/gemini-api/docs/pricing&quot;&gt;Google AI Developer Docs&lt;/a&gt;
Google&apos;s pricing page (updated 2026-06-02) confirms gemini-2.0-flash-001 and gemini-2.0-flash-lite-001 were shut down June 1, 2026 with no further access. Migration options: Gemini 3.5 Flash at $1.50/$9.00 per 1M input/output tokens (Standard tier), or the budget Gemini 3.1 Flash-Lite at $0.25/$1.50. Google also introduced three inference tiers — Flex (lower cost, batch-speed SLA), Standard, and Priority (80% premium) — plus a new context caching fee structure at $0.025–$8.10/1M tokens/hour depending on model. &lt;strong&gt;Builder angle:&lt;/strong&gt; Any production call to a gemini-2.0-flash model ID is now a dead endpoint; migration to Gemini 3.1 Flash-Lite preserves budget (comparable pricing to 2.0 Flash), while Gemini 3.5 Flash output is ~6× pricier and requires evaluating whether the quality uplift justifies it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Databricks rolls out Instructed-Retriever-1 to all customers: FP8 + speculative decoding cuts search latency 3×&lt;/strong&gt; — &lt;a href=&quot;https://www.databricks.com/blog/3x-faster-search-parallel-test-time-scaling-instructed-retriever-1&quot;&gt;Databricks Blog&lt;/a&gt;
Databricks&apos; Knowledge Assistant now uses Instructed-Retriever-1, a MoE model served with FP8 quantization (NVIDIA ModelOpt, zero measured quality degradation) and speculative decoding contributing a 30%+ speed-up. Production results: 3× faster search, 2× faster answer generation, TTFT ~2s, E2E latency below 10s. The model handles both query generation and reranking in parallel via test-time scaling. &lt;strong&gt;Builder angle:&lt;/strong&gt; First production-validated data point for FP8 + speculative decoding combined on a MoE serving stack — a cost/latency template builders can reference when evaluating similar optimizations for their own self-hosted or Databricks-hosted inference pipelines.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Albireo (arXiv): 2× throughput and 54% lower energy vs vLLM via non-scalable scheduling overhead elimination&lt;/strong&gt; — &lt;a href=&quot;https://arxiv.org/abs/2606.01927&quot;&gt;source&lt;/a&gt; — Research paper (June 1, 2026) — 2× throughput, 48% latency reduction, 54% lower energy vs vLLM on same hardware; no deployable release yet, tracks as pre-deployment signal for self-hosted inference operators.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-platform</category><category>routing</category><category>latency</category><category>throughput</category><category>cerebras</category><category>kimi</category><category>benchmarks</category></item><item><title>Builder Loop — June 9, 2026</title><link>https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-09/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-09/</guid><description>Apple ships Xcode 27 with native ACP and MCP support, integrating Claude, Gemini, and OpenAI coding agents into the IDE; Anthropic releases Swift packag…</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Coding agents are bulldozers, not replacements — humans still frame the problem. OSS shifts default stack choices faster than any vendor roadmap. When everyone can generate and fork, differentiation is taste, review, and what you ship before it becomes the default.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Apple ships Xcode 27 with native ACP and MCP support, integrating Claude, Gemini, and OpenAI coding agents into the IDE&lt;/strong&gt; — &lt;a href=&quot;https://www.apple.com/newsroom/2026/06/apple-aids-app-development-with-new-intelligence-frameworks-and-advanced-tools/&quot;&gt;Apple Newsroom&lt;/a&gt;
Xcode 27, announced at WWDC 2026 on June 8, adds native Agent Client Protocol (ACP) and Model Context Protocol (MCP) support. Coding agents from Anthropic, Google, and OpenAI run directly inside the IDE with interactive planning, multiturn conversations, side-by-side code previews, and autonomous test/simulator validation via the new Device Hub. External tools like GitHub and Figma connect via MCP. The Foundation Models framework gains a single Swift API supporting server models and image input; Small Business Program developers get Apple on-device model access at no cloud API cost. &lt;strong&gt;Builder angle:&lt;/strong&gt; Xcode 27 is now the first Apple IDE where any ACP-compatible coding agent runs natively and validates its own code against your simulator — no third-party plugin or manual handoff required.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Anthropic releases Swift package for Claude + Apple Foundation Models, enabling on-device/cloud AI handoff in SwiftUI apps&lt;/strong&gt; — &lt;a href=&quot;https://claude.com/blog/claude-for-foundation-models&quot;&gt;Anthropic Blog&lt;/a&gt;
Anthropic released a native Swift package that integrates Claude with Apple&apos;s Foundation Models framework for iOS 27, iPadOS 27, macOS 27, visionOS 27, and watchOS 27. The package accepts typed value outputs from Apple&apos;s on-device model and routes them to Claude for multi-step reasoning, code generation, or web search, returning streaming responses, tool calls, and structured data back into SwiftUI views. Developers work entirely in Swift without handling raw prompt text. &lt;strong&gt;Builder angle:&lt;/strong&gt; Apple platform developers can chain on-device Foundation Model outputs directly into Claude with a single Swift import — typed input, streaming response, and tool calls included, without raw prompt management.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;JetBrains Junie CLI adds --acp flag, making it a protocol-native agent for any ACP-compatible editor&lt;/strong&gt; — &lt;a href=&quot;https://junie.jetbrains.com/docs/junie-cli-acp.html&quot;&gt;JetBrains Junie Docs&lt;/a&gt;
JetBrains published documentation dated June 8, 2026 for Junie CLI&apos;s ACP mode. When launched with — acp true, the CLI shifts from interactive terminal mode to serving requests initiated by external ACP-compatible editors and IDEs over JSON-RPC via stdio or HTTP/WebSocket. In ACP mode Junie exposes diff generation, Markdown-formatted responses, and IDE state inspection to any compliant host editor, decoupling the JetBrains agent from a single IDE host. &lt;strong&gt;Builder angle:&lt;/strong&gt; Any editor that speaks ACP can now delegate to Junie without a JetBrains-specific plugin — the same protocol Xcode 27 and Devin Desktop adopted, making Junie a drop-in agent for polyglot editor environments.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;verl v0.8.0 ships as ByteDance&apos;s production-grade open RL training library with vLLM/SGLang integration&lt;/strong&gt; — &lt;a href=&quot;https://github.com/volcengine/verl&quot;&gt;source&lt;/a&gt; — Builders running RLHF or RL-from-feedback pipelines can drop verl into any vLLM- or SGLang-backed cluster and get a production-tested PPO/GRPO loop that eliminates the training-inference handoff bottleneck.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenEnv launches as community-governed open standard for agentic RL environments, backed by Meta-PyTorch, Nvidia, Unsloth, and Hugging Face&lt;/strong&gt; — &lt;a href=&quot;https://huggingface.co/blog/openenv-agentic-rl&quot;&gt;source&lt;/a&gt; — Builders training agents via RL can write one OpenEnv-compliant environment and plug it into any trainer (verl, prime-rl, VeRL-Omni) without rewriting environment adapters per framework.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>builder-loop</category><category>xcode</category><category>apple</category><category>acp</category><category>mcp</category><category>wwdc</category><category>ios27</category></item><item><title>Silicon &amp; Systems — June 9, 2026</title><link>https://artificialcuriositylabs.ai/daily/silicon-systems/2026-06-09/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/silicon-systems/2026-06-09/</guid><description>Intel Foundry gains momentum: Google reportedly orders 3M TPUs through 2028, NVIDIA evaluates Intel 18A for Feynman multi-die GPU; NVIDIA and SK hynix a…</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;AI is electricity — literally and figuratively. Silicon supply sets the floor on inference economics; power and data center capacity set the ceiling. Track the physical stack not as separate coverage but as the mechanism behind why inference economics and inference access move.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Intel Foundry gains momentum: Google reportedly orders 3M TPUs through 2028, NVIDIA evaluates Intel 18A for Feynman multi-die GPU&lt;/strong&gt; — &lt;a href=&quot;https://www.trendforce.com/news/2026/06/09/news-intel-foundry-gains-momentum-as-google-reportedly-orders-3m-tpus-nvidia-evaluates-18a-for-multi-die-gpu-design/&quot;&gt;TrendForce&lt;/a&gt;
TrendForce reports Google has placed orders for more than 3 million TPU chips with Intel Foundry through 2028, representing roughly half of Google&apos;s projected 2028 TPU output; Google&apos;s TPU v8e (expected H2 2027) will use Intel EMIB packaging now at ~90% yield. Separately, NVIDIA is running early multi-project wafer feasibility tests on Intel&apos;s 18A process for a next-generation Feynman GPU design integrating four dies in a single package. Tesla is Intel Foundry&apos;s first confirmed 14A customer. TSMC remains NVIDIA&apos;s primary fab for current Rubin 3nm production. &lt;strong&gt;Builder angle:&lt;/strong&gt; If NVIDIA qualifies Intel 18A for multi-die Feynman GPUs (~2028 horizon), it opens a second high-volume fab for leading-edge AI accelerators beyond TSMC CoWoS, potentially loosening the packaging bottleneck that currently constrains GPU supply.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA and SK hynix announce multiyear technology partnership to advance memory for AI factories&lt;/strong&gt; — &lt;a href=&quot;https://nvidianews.nvidia.com/news/sk-hynix-ai-factory&quot;&gt;NVIDIA Newsroom&lt;/a&gt;
NVIDIA and SK hynix announced a formal multiyear co-development partnership on June 7 covering next-generation DRAM and NAND for four platform categories: Vera Rubin AI supercomputers, Vera CPUs, RTX Spark PCs, and Jetson Thor robotics. The companies will use CUDA-X, PhysicsNeMo, and Omniverse to accelerate SK hynix&apos;s chip design simulation and to build autonomous digital-twin fabs. No specific supply volumes were disclosed, but the agreement aligns SK hynix&apos;s memory roadmap to NVIDIA&apos;s long-term infrastructure timeline across AI, personal, and physical AI sectors. &lt;strong&gt;Builder angle:&lt;/strong&gt; A formal NVIDIA–SK hynix roadmap lock-in means HBM and advanced DRAM development cycles will track NVIDIA&apos;s platform releases rather than open-market demand, narrowing spot-market allocation for buyers outside contracted hyperscalers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA Vera Rubin enters full production; Jensen Huang confirms all three HBM4 suppliers qualified for Q3 2026 customer shipments&lt;/strong&gt; — &lt;a href=&quot;https://www.techtimes.com/articles/317855/20260605/nvidia-vera-rubin-hbm4-jensen-huang-confirms-all-three-suppliers-production-q3-ship.htm&quot;&gt;TechTimes&lt;/a&gt;
At GTC Taipei (June 1) Jensen Huang announced Vera Rubin is in full production, with customer shipments scheduled for Q3 2026. In Seoul on June 5 he confirmed all three HBM4 suppliers—Samsung, SK hynix, and Micron—are qualified and in production: &apos;All three vendors have been qualified. All three vendors are in production, and they&apos;re all racing to support Vera Rubin.&apos; Industry analysts estimate SK hynix holds ~60–70% of allocated HBM4 volume, Samsung ~25–30%, Micron the remainder. H2 2026 production volume is expected to be &apos;considerably larger&apos; than H1, with 2027 larger still. &lt;strong&gt;Builder angle:&lt;/strong&gt; Q3 2026 is now the concrete delivery window for Vera Rubin accelerators; builders planning AI infrastructure upgrades have a locked timeline, and the three-supplier HBM4 setup reduces single-source risk on memory availability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Glass substrates eye 2027 commercialization as CoWoS costs rise and hyperscaler demand grows&lt;/strong&gt; — &lt;a href=&quot;https://www.trendforce.com/news/2026/06/05/news-glass-substrates-eye-2027-launch-scale-toward-2030-as-cowos-costs-rise-and-hyperscaler-demand-grows/&quot;&gt;TrendForce&lt;/a&gt;
TSMC, Intel, Samsung Electro-Mechanics, and Korean substrate firms are racing to commercialize glass substrates as CoWoS costs rise and AI packaging demand outpaces organic-ABF capacity. Key timeline: TSMC built a 310×310 mm CoPoS pilot line; Samsung Electro-Mechanics targets mass production at its Sejong plant in H2 2027; SKC/Absolics completed a KRW 1.76 trillion rights offering and is supplying prototypes to a U.S. telecom chip company. Intel is targeting commercialization around 2030 via its Rio Rancho co-packaged optics line. NVIDIA and Google are identified as the primary demand drivers. Full-scale mass production is not expected before 2030. &lt;strong&gt;Builder angle:&lt;/strong&gt; Glass substrates will not relieve today&apos;s CoWoS bottleneck before 2027–2028 at the earliest; builders planning GPU cluster expansions should budget around continued packaging constraints and expect CoWoS lead times to remain stretched through 2026–2027.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LITEON and QCT showcase 800V DC liquid-cooled 110 kW power rack for NVIDIA Vera Rubin NVL72 at Computex 2026&lt;/strong&gt; — &lt;a href=&quot;https://www.liteon.com/en/news/press-center/content/liteon-qct-computex-2026&quot;&gt;LITEON Press Center&lt;/a&gt;
LITEON and QCT jointly unveiled an 800V DC liquid-cooled power rack with integrated cold plates and a 110 kW rack-level power shelf alongside CRPS PSUs with three-phase input and intelligent load-balancing, co-designed for the NVIDIA Vera Rubin NVL72 platform. The architecture co-designs power delivery and thermal management as a single system to handle transient load fluctuations in dense AI inference racks. &lt;strong&gt;Builder angle:&lt;/strong&gt; Operators planning Vera Rubin NVL72 deployments now have a validated 800V DC liquid-cooled power path that eliminates separate thermal/power system integrations, reducing rack-level design complexity and procurement surface.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Navitas joins NVIDIA MGX ecosystem with 800V-to-6V GaN power delivery board at 97.5% efficiency and 2,100 W/in³&lt;/strong&gt; — &lt;a href=&quot;https://navitassemi.com/navitas-collaborates-with-nvidia-mgx-ecosystem-to-accelerate-800-vdc-ai-infrastructure/&quot;&gt;source&lt;/a&gt; — Eliminating the 48V intermediate stage in 800V DC AI racks removes one full conversion loss, reducing heat budget per rack and enabling higher GPU density in the same floor footprint.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GridReadiness June 2026 brief: GE Vernova at 60+ months backlog; France brownfield offers 18–26 month transformer path for 2028 commissioning&lt;/strong&gt; — &lt;a href=&quot;https://www.gridreadiness.com/data/monthly-brief-june-2026.html&quot;&gt;source&lt;/a&gt; — Any project targeting 2028 commissioning must bypass Tier 1 transformer vendors entirely and lock France or secondary-tier suppliers now before remaining inventory is absorbed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Wolfspeed Gen 5 SiC MOSFETs launch with 27% lower on-resistance than competition at 750V and 1200V nodes&lt;/strong&gt; — &lt;a href=&quot;https://www.businesswire.com/news/home/20260609876239/en/Wolfspeed-Unveils-the-Industrys-Lowest-RDSON-Silicon-Carbide-SiC-MOSFETs-in-New-Technology-Generation&quot;&gt;source&lt;/a&gt; — Lower Rds(on) at 1200V narrows the efficiency gap between SiC and GaN for high-voltage AI data center power supplies, giving procurement teams a near-term SiC supply option on a published roadmap.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>silicon-systems</category><category>Intel</category><category>NVIDIA</category><category>Google</category><category>TPU</category><category>18A</category><category>foundry</category></item><item><title>Agent Commerce — June 8, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-commerce/2026-06-08/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-commerce/2026-06-08/</guid><description>Crossmint launches Agentic Cards API on Visa Intelligent Commerce, lets agents pay with tokenized Visa cards</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Agents that can pay change commerce architecture. Protocol moves (x402, SPT, checkout rails) are infrastructure — the moat is trust, scoped authorization, and human-defined intent boundaries.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Crossmint launches Agentic Cards API on Visa Intelligent Commerce, lets agents pay with tokenized Visa cards&lt;/strong&gt; — &lt;a href=&quot;https://www.crossmint.com/announcement/agentic-cards-api-launch-visa-basistheory&quot;&gt;Crossmint&lt;/a&gt;
Crossmint shipped a public Agentic Cards API that lets eligible US-issued Visa credit/debit cardholders authorize AI agents to pay on their behalf. It&apos;s built on Visa Intelligent Commerce Connect for tokenized credentials and Basis Theory (PCI Level 1, SOC 2) as the credential vault — card numbers and CVCs never reach the agent, with issuer-set spend limits scoping each transaction. The API is live now via Crossmint&apos;s agent docs and bundled into lobster.cash, with stated support in Claude Code, OpenClaw, Hermes, and Zo Computer. &lt;strong&gt;Builder angle:&lt;/strong&gt; Concrete, working pattern for wiring real-card payments into an agent without touching raw PANs: tokenize via Visa Intelligent Commerce, vault via Basis Theory, scope with issuer spend limits, call Crossmint&apos;s API from the agent loop.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-commerce</category><category>protocol</category><category>Visa Intelligent Commerce</category><category>tokenization</category><category>shared payment token</category><category>production integration</category></item><item><title>Agent Reliability — June 8, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-08/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-08/</guid><description>Microsoft open-sources OmniVec, an embedding pipeline platform that auto-syncs vectors with changing operational data on Azure; Volcengine open-sources …</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;You cannot run what you cannot see. Grounding is institutional context encoded in retrieval; observability is electricity-metering for agent loops; security moves from policy decks to runtime guardrails. Reliability is the stack between &quot;it works in demo&quot; and &quot;it runs in prod.&quot;&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft open-sources OmniVec, an embedding pipeline platform that auto-syncs vectors with changing operational data on Azure&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/cosmosdb/introducing-omnivec-an-open-source-embedding-platform-for-ai-apps-on-azure/&quot;&gt;Microsoft Azure Cosmos DB Blog&lt;/a&gt;
Microsoft open-sourced OmniVec (public preview), a platform that automates embedding pipelines end to end: register a source, embedding model, and destination vector store, and it handles initial backfill, change tracking, model invocation, and writeback. Change detection uses native mechanisms per source — Cosmos DB change feed, Blob Storage events, CDC for SQL Server/PostgreSQL — and it deploys into the user&apos;s own Azure subscription on AKS with a REST API, CLI, and web UI. Initial release supports Cosmos DB, PostgreSQL, SQL Server, and Blob Storage as both sources and destinations. &lt;strong&gt;Builder angle:&lt;/strong&gt; Removes the custom-glue work of keeping vector indexes in sync with live operational data — CDC-based change tracking means re-embedding happens automatically as source rows change, addressing a core RAG freshness problem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Volcengine open-sources OpenViking, a filesystem-style &apos;context database&apos; that replaces flat vector search with tiered, recursive retrieval&lt;/strong&gt; — &lt;a href=&quot;https://github.com/volcengine/OpenViking&quot;&gt;GitHub (volcengine/OpenViking)&lt;/a&gt;
ByteDance&apos;s Volcengine shipped OpenViking v0.3.24 (June 5), an open-source context-management system for agents built on a virtual filesystem (viking:// URIs) instead of a traditional vector store. It auto-processes content into three tiers — L0 one-line abstracts, L1 ~2k-token overviews, L2 full detail on demand — and retrieves via a &apos;directory recursive&apos; strategy combining intent analysis, initial vector positioning, and recursive drill-down through subdirectories rather than flat top-k similarity search. It plugs into multiple embedding/VLM providers (Volcengine Doubao, OpenAI, Gemini, local Ollama models) and includes session-based memory extraction that updates agent/user memory after each run. &lt;strong&gt;Builder angle:&lt;/strong&gt; Offers a concrete, reproducible alternative retrieval pattern to flat vector top-k — tiered context loading plus recursive directory drill-down — packaged as an open-source, swappable-embedding-provider system for agent memory and knowledge bases.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Snowflake launches Cortex Sense, a shared context layer that auto-assembles business knowledge for production agents&lt;/strong&gt; — &lt;a href=&quot;https://www.snowflake.com/en/news/press-releases/snowflake-cowork-powers-the-agentic-enterprise-as-the-personal-agent-for-knowledge-workers-to-work-smarter/&quot;&gt;Snowflake Newsroom&lt;/a&gt;
At Summit 26, Snowflake announced Cortex Sense, a capability that &apos;automatically brings together the data, business definitions, and operational knowledge that AI agents need to be trustworthy and useful&apos; into a shared context layer usable by both its CoWork agent and CoCo coding agent. It ships with prebuilt role plugins (e.g., finance, sales) that &apos;combine skills, business logic, and MCP connectors,&apos; plus a Deep Research feature for multi-step, cited reasoning over structured and unstructured data. Snowflake says this cuts manual context setup and moves agents &apos;from concept to production in days instead of weeks.&apos; &lt;strong&gt;Builder angle:&lt;/strong&gt; Packages the unglamorous RAG groundwork — business-definition mapping, connector wiring, context assembly — into reusable role-based plugins, aimed at cutting the setup time that normally gates enterprise agent deployment.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Splunk ships AI Agent Monitoring in Observability Cloud built on OpenTelemetry and Cisco AGNTCY&lt;/strong&gt; — &lt;a href=&quot;https://www.splunk.com/en_us/blog/observability/monitor-llm-and-agent-performance-with-ai-agent-monitoring-in-splunk-observability-cloud.html&quot;&gt;source&lt;/a&gt; — Gives teams already on Splunk APM a drop-in path to per-trace cost, latency, and quality (hallucination/toxicity) scoring without adopting a separate LLM-observability vendor.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arize AX adds agent fleet observability with token-cost tracking and harness-as-a-judge evals&lt;/strong&gt; — &lt;a href=&quot;https://arize.com/blog/building-ai-factory-self-improving-agents-arize-ax/&quot;&gt;source&lt;/a&gt; — Targets teams running many agents at once — surfaces which agent in the fleet is burning the most tokens or drifting in behavior, and lets you spin up an eval run straight from a monitoring alert.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-reliability</category><category>embeddings</category><category>ingestion-pipeline</category><category>vector-sync</category><category>azure</category><category>open-source</category><category>context-database</category></item><item><title>Agent Stack — June 8, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-08/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-08/</guid><description>Amazon Bedrock AgentCore Runtime adds interactive shells for live terminal access into agent sessions; Microsoft Foundry ships production memory stack f…</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Amazon Bedrock AgentCore Runtime adds interactive shells for live terminal access into agent sessions&lt;/strong&gt; — &lt;a href=&quot;https://aws.amazon.com/about-aws/whats-new/2026/06/amazon-bedrock-agentcore-runtime/&quot;&gt;AWS What&apos;s New&lt;/a&gt;
AWS shipped a new InvokeAgentRuntimeCommandShell API that opens a persistent, PTY-backed terminal over WebSocket directly into a running agent&apos;s microVM — preserving env vars, working directory, and history across reconnects (up to 10 concurrent shells per runtime, sessions resumable via shell ID). It complements the existing stateless InvokeAgentRuntimeCommand for one-shot calls. &lt;strong&gt;Builder angle:&lt;/strong&gt; Gives builders an SSH-like debug path into live agent runtimes (inspecting generated files, checking package versions, completing device-code logins) without redeploying or instrumenting the agent.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft Foundry ships production memory stack for agents: procedural memory, TTL, CRUD UI, multimodal recall&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/foundry/memory-build2026/&quot;&gt;Microsoft Foundry Blog&lt;/a&gt;
Foundry&apos;s Build 2026 memory update adds procedural memory that captures and reuses successful action sequences (~5% gain on STATE-Bench/Tau-Bench), a portal UI for direct CRUD on stored memories, configurable time-to-live to auto-retire low-value entries, multimodal (image) memory, and explicit remember/forget commands. Agent Framework also gains a local file-based MemoryFileStore/MemoryContextProvider pattern for inspectable markdown memory before scaling to managed stores. &lt;strong&gt;Builder angle:&lt;/strong&gt; Turns agent memory from an opaque black box into something you can inspect, version, cap with TTL, and unit-test locally before promoting to a managed store — directly changes how memory gets debugged and governed in production harnesses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Azure SRE Agent launches Plugin Marketplace with git-commit-pinned, hash-verified skill installs&lt;/strong&gt; — &lt;a href=&quot;https://learn.microsoft.com/en-us/azure/sre-agent/plugin-marketplace&quot;&gt;Microsoft Learn / Azure Docs&lt;/a&gt;
Azure SRE Agent now lets teams register curated GitHub-hosted marketplaces (via marketplace.json manifests, including the official Azure SRE Agent Plugins and Anthropic&apos;s Claude Plugins repos) and install bundled skills + MCP server configs. Each install pins to an exact git commit, with one-click update checks via SHA-256 hash comparison, recorded provenance (source/version/hash), and supports private repos/GitHub Enterprise with shared per-marketplace credentials. &lt;strong&gt;Builder angle:&lt;/strong&gt; Makes skill distribution reproducible and auditable — version pinning plus hash diffing means upstream changes can&apos;t silently alter agent behavior, cutting per-skill setup from ~10-15 minutes to ~30 seconds.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;NetFoundry launches zero-trust MCP and LLM gateways with no shared API keys&lt;/strong&gt; — &lt;a href=&quot;https://www.prnewswire.com/news-releases/netfoundry-launches-enterprise-class-mcp-and-llm-gateways-bringing-zero-trust-to-ai-deployments-302789053.html&quot;&gt;source&lt;/a&gt; — Replaces runtime allow/deny checks with registry-level tool removal and identity-based (not key-based) agent auth — a different default for teams wiring agents to MCP servers in regulated or air-gapped environments.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Critical RCE in Flowise lets attackers hijack MCP stdio transport config to run OS commands&lt;/strong&gt; — &lt;a href=&quot;https://www.penligent.ai/hackinglabs/cve-2026-40933/&quot;&gt;source&lt;/a&gt; — A concrete deployment blocker for anyone wiring stdio-transport MCP servers into agent platforms — validate/sandbox subprocess launch configs rather than trusting schema checks alone, and patch Flowise to 3.1.0 immediately.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-stack</category><category>AWS</category><category>Bedrock AgentCore</category><category>agent runtime</category><category>debugging</category><category>API</category><category>Microsoft</category></item><item><title>AI Platform — June 8, 2026</title><link>https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-08/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-08/</guid><description>DigitalOcean ships prefix-aware routing and incoming cached-token pricing, claims up to 4x lower effective compute cost; Anthropic moves Claude Agent SD…</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Token price is the new kWh, and the platform you ship on determines how fast you reach production. Jevons says falling inference cost drives more loops and heavier agents — track pricing, routing, and ship infrastructure moves that change what builders can afford to run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DigitalOcean ships prefix-aware routing and incoming cached-token pricing, claims up to 4x lower effective compute cost&lt;/strong&gt; — &lt;a href=&quot;https://www.digitalocean.com/blog/reduce-llm-inference-costs-prefix-caching&quot;&gt;DigitalOcean Blog&lt;/a&gt;
DigitalOcean&apos;s Inference Gateway now routes requests to the GPU replica already holding a matching KV-cache prefix instead of round-robin, lifting cache hit rates from ~25% to 75%+ on shared-prefix workloads. The post (June 2) says this can cut effective compute cost up to 4x per request on identical hardware and recover &apos;34 GPU-hours saved every single day&apos; at 1M requests/day with 70% prompt overlap; it also previews cached-token pricing that bills cache hits at a discount instead of full recompute rates. &lt;strong&gt;Builder angle:&lt;/strong&gt; Apps with high prompt overlap (system prompts, RAG templates, tool schemas) can cut inference spend by routing to cache-aware gateways now and onto discounted cached-token pricing once it ships, instead of recomputing identical prefixes on every call.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Anthropic moves Claude Agent SDK to separate credit-pool billing on June 15, ending subscription coverage for automated workloads&lt;/strong&gt; — &lt;a href=&quot;https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan&quot;&gt;Anthropic Support (Claude Help Center)&lt;/a&gt;
Starting June 15, 2026, Agent SDK usage — the Python/TypeScript SDK, headless &lt;code&gt;claude -p&lt;/code&gt;, the Claude Code GitHub Actions integration, and third-party apps authenticated via the Agent SDK — moves off standard subscription limits onto a dedicated monthly credit pool billed at API rates: $20/mo for Pro, $100 for Max 5x, $200 for Max 20x, and $20/$100 for Team Standard/Premium seats. Usage beyond the credit either flows to pay-as-you-go rates (if enabled) or halts until the next billing cycle. Interactive Claude Code, web chat, and app usage are unaffected. &lt;strong&gt;Builder angle:&lt;/strong&gt; Teams running Claude in CI/CD, cron jobs, or background agents via the Agent SDK need a separate budget line starting June 15 — flat-rate subscriptions stop covering automated/headless usage and overage either costs API rates or stops the agent.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Study finds reasoning-model list prices mislead on real cost — up to 28x reversal between cheaper-listed and actually-cheaper models&lt;/strong&gt; — &lt;a href=&quot;https://arxiv.org/abs/2603.23971&quot;&gt;arXiv&lt;/a&gt;
Comparing reasoning-model pairs across tasks, researchers found that in 32% of model-pair comparisons the model with the lower listed per-token price actually cost more in total — by as much as 28x — because thinking-token consumption varies wildly (one model used up to 900% more reasoning tokens than another on the same query). Concrete case cited: Gemini 3 Flash lists 80% cheaper than GPT-5.4 but costs 38% more overall once thinking-token volume is counted. &lt;strong&gt;Builder angle:&lt;/strong&gt; Don&apos;t pick a reasoning model on its per-token sticker price — measure actual thinking-token consumption per task, since a &apos;cheaper&apos; model can quietly cost multiples more once reasoning overhead is counted.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cloudflare Agents SDK v0.14.0 adds Agent Skills, chat messengers, scheduled tasks, and durable Think Workflows&lt;/strong&gt; — &lt;a href=&quot;https://developers.cloudflare.com/changelog/post/2026-06-02-agents-sdk-v0140/&quot;&gt;source&lt;/a&gt; — Adds a declarative scheduling DSL and durable Workflow-backed reasoning steps to the Agents SDK, letting builders move recurring/long-running agent logic out of custom cron and state-management code and into Cloudflare&apos;s managed runtime.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Microsoft ships @azure/functions-skills, an npx-installable agent toolkit for the new Azure Functions serverless agents runtime&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/azure-sdk/introducing-azure-functions-skills-ai-era-workspace/&quot;&gt;source&lt;/a&gt; — Gives builders a single CLI to scaffold, validate, and deploy event-driven AI agents onto Azure&apos;s serverless runtime with identity-based defaults baked in, rather than hand-wiring Functions + MCP + agent config separately.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-platform</category><category>prefix-caching</category><category>kv-cache</category><category>routing</category><category>pricing</category><category>billing</category><category>agent-sdk</category></item><item><title>Builder Loop — June 8, 2026</title><link>https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-08/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-08/</guid><description>Cognition rebrands Windsurf as Devin Desktop, ships native Agent Client Protocol support; Snap details CodePal, an AI code reviewer that runs a multi-pa…</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Coding agents are bulldozers, not replacements — humans still frame the problem. OSS shifts default stack choices faster than any vendor roadmap. When everyone can generate and fork, differentiation is taste, review, and what you ship before it becomes the default.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cognition rebrands Windsurf as Devin Desktop, ships native Agent Client Protocol support&lt;/strong&gt; — &lt;a href=&quot;https://cognition.ai/blog/introducing-devin-desktop&quot;&gt;Cognition Blog&lt;/a&gt;
Cognition relaunched Windsurf as Devin Desktop, making the Agent Command Center (a Kanban board of local and cloud agents, sorted by status) the default surface, and shipping native support for the open Agent Client Protocol so any ACP-compatible agent — Codex, Claude Agent, OpenCode, or in-house builds — can run inside the editor alongside Devin. The change rolled out as an over-the-air update on June 2, 2026; existing Windsurf plans, pricing, and extensions carry over. &lt;strong&gt;Builder angle:&lt;/strong&gt; ACP support means teams can standardize on one editor while running whichever agent fits a given task, breaking the lock-in between IDE and agent vendor.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Snap details CodePal, an AI code reviewer that runs a multi-pass verification loop on every PR&lt;/strong&gt; — &lt;a href=&quot;https://eng.snap.com/codepal&quot;&gt;Snap Engineering Blog&lt;/a&gt;
Snap published the architecture behind CodePal, its mandatory pre-human PR reviewer: two parallel bootstrap passes with different sampling parameters, a speculative third pass on disagreement, additional passes only when new findings emerge, and a verifier that strips hallucinated or contradictory findings before posting. It builds context without cloning repos — using GitHub Enterprise APIs and tree-sitter symbol indexing — and does cross-repo semantic search to catch downstream breakages. Snap reports 200,000+ reviews over four months at ~$0.40/review, finishing in ~10 minutes versus ~5 hours for first human review, recall climbing from 30% to 80%, and 90% PR adoption within a quarter (up from a 9% pilot). &lt;strong&gt;Builder angle:&lt;/strong&gt; The multi-pass-plus-verifier pattern and concrete cost/recall numbers ($0.40/review, 30%→80% recall) give teams a reproducible blueprint for replacing or gating human first-pass review with an agent.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Copilot in Visual Studio adds a Plan agent that drafts implementation plans before code is written&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-04-github-copilot-in-visual-studio-may-update&quot;&gt;GitHub Changelog&lt;/a&gt;
The May update to Copilot in Visual Studio ships a new Plan agent that analyzes the codebase with read-only tools, asks clarifying questions, and produces a markdown implementation plan that can then be handed to Agent mode for execution. The release also adds a Skills panel for managing discovered agent skills, a multi-file change-review summary view (accept/reject by file, all-files, or chunk), a context-window usage ring with conversation summarization, and the ability to attach Git History/Blame commits directly as chat context. &lt;strong&gt;Builder angle:&lt;/strong&gt; Splitting &apos;plan&apos; from &apos;execute&apos; into separate agent modes gives developers a checkpoint to review and edit scope before an agent starts editing files — directly changes the review-before-you-build loop.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;NVIDIA releases Nemotron 3 Ultra as open-weight, open-data, and open-recipe under OpenMDW-1.1 with reproducible agentic benchmarks&lt;/strong&gt; — &lt;a href=&quot;https://developer.nvidia.com/blog/nvidia-nemotron-3-ultra-powers-faster-more-efficient-reasoning-for-long-running-agents&quot;&gt;source&lt;/a&gt; — Builders can fork the full training recipe (not just run inference) and reproduce NVIDIA&apos;s published agentic-coding and long-context benchmark numbers from the same Hugging Face checkpoints.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google ships Gemma 4 12B, an encoder-free multimodal model that runs locally on a 16GB-VRAM laptop GPU&lt;/strong&gt; — &lt;a href=&quot;https://developers.googleblog.com/gemma-4-12b-the-developer-guide/&quot;&gt;source&lt;/a&gt; — A 12B multimodal model that fits a consumer GPU and lands directly in Ollama/LM Studio pipelines lowers the bar for local agentic prototyping with combined audio, vision, and text inputs.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>builder-loop</category><category>ACP</category><category>IDE</category><category>agent-orchestration</category><category>Cognition</category><category>Windsurf</category><category>code-review</category></item><item><title>Agent Stack — June 7, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-07/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-07/</guid><description>LangChain ships `create_agent` primitive with composable middleware for production harnesses; Google Managed Agents API provisions remote sandbox and st…</description><pubDate>Sun, 07 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LangChain ships &lt;code&gt;create_agent&lt;/code&gt; primitive with composable middleware for production harnesses&lt;/strong&gt; — &lt;a href=&quot;https://www.langchain.com/blog/how-to-build-a-custom-agent-harness&quot;&gt;LangChain Blog&lt;/a&gt;
LangChain published &lt;code&gt;create_agent&lt;/code&gt;, a minimalist three-parameter primitive (model, tools, system prompt) that exposes a middleware layer as the main customization surface. Middleware slots cover context-overflow summarization, filesystem/memory persistence, shell and code-interpreter access, retry and fallback logic, PII/policy enforcement, and human-in-the-loop approval gates. The design lets teams build these production behaviors once and reuse them across multiple agents. &lt;strong&gt;Builder angle:&lt;/strong&gt; Replaces ad-hoc harness scaffolding with a composable middleware stack you can test and reuse — directly changes how you wire context management, approvals, and retry into any agent loop.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Google Managed Agents API provisions remote sandbox and stateful harness via single REST call&lt;/strong&gt; — &lt;a href=&quot;https://cloud.google.com/blog/products/ai-machine-learning/what-google-cloud-announced-in-ai-this-month&quot;&gt;Google Cloud Blog&lt;/a&gt;
Announced at Google I/O 2026, the Gemini Enterprise Agent Platform Managed Agents API (pre-GA) lets developers POST to the Interactions endpoint to provision a Google-hosted remote sandbox and agent harness in one call. An &lt;code&gt;environment_id&lt;/code&gt; parameter reuses a persistent container — preserving libraries, scripts, files, and state — across multi-turn runs; &lt;code&gt;previous_interaction_id&lt;/code&gt; continues conversation history. Agents can execute code, manage files, and call backend systems without the developer managing underlying compute or security. &lt;strong&gt;Builder angle:&lt;/strong&gt; Offloads sandbox lifecycle and state management to Google infrastructure — you get a durable, multi-turn agent execution environment without running your own harness server.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA NemoClaw open blueprint ships OpenShell secure runtime for long-running industrial agents&lt;/strong&gt; — &lt;a href=&quot;https://blogs.nvidia.com/blog/industrial-software-leaders-secure-autonomous-ai-engineers-nemoclaw/&quot;&gt;NVIDIA Blog&lt;/a&gt;
NVIDIA published NemoClaw, an open blueprint for building specialized, long-running agents that combines a choice of orchestration harness (OpenClaw or Hermes), a model router, and NeMo customization libraries. The open-source OpenShell runtime core governs per-agent access to files, networks, and tools with policy-based security at every layer. Early industrial adopters include Cadence, Dassault Systèmes, Siemens, and Synopsys, compressing weeks of simulation workflows into hours. &lt;strong&gt;Builder angle:&lt;/strong&gt; NemoClaw&apos;s pluggable harness + OpenShell security layer provides a concrete reference architecture for domain-specific long-running agents where tool access must be policy-governed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Glean launches enterprise MCP Gateway with precomputed knowledge-graph context layer&lt;/strong&gt; — &lt;a href=&quot;https://www.glean.com/blog/introducing-glean-mcp-gateway&quot;&gt;source&lt;/a&gt; — Routing agent tool calls through a precomputed knowledge graph gateway reduces context tokens ~30% and offloads permission enforcement to the connector layer—eliminating per-source OAuth wiring at agent build time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Noma releases Agent Access Control with real-time MCP server registry and per-tool 3-state gating&lt;/strong&gt; — &lt;a href=&quot;https://www.helpnetsecurity.com/2026/06/02/noma-brings-visibility-and-access-governance-to-ai-agents-and-mcp-servers/&quot;&gt;source&lt;/a&gt; — Per-tool 3-state gating scopes agent permissions to individual MCP operations rather than granting or denying entire server access—enabling narrow least-privilege without manual per-connection policies.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-stack</category><category>langchain</category><category>sdk</category><category>middleware</category><category>harness</category><category>create_agent</category><category>production</category></item><item><title>AI Platform — June 7, 2026</title><link>https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-07/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-07/</guid><description>vLLM Semantic Router v0.3 Themis ships SAAR stateful routing with RouterArena #1 ranking at $0.11/1K queries; DigitalOcean Inference Gateway ships prefi…</description><pubDate>Sun, 07 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Token price is the new kWh, and the platform you ship on determines how fast you reach production. Jevons says falling inference cost drives more loops and heavier agents — track pricing, routing, and ship infrastructure moves that change what builders can afford to run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;vLLM Semantic Router v0.3 Themis ships SAAR stateful routing with RouterArena #1 ranking at $0.11/1K queries&lt;/strong&gt; — &lt;a href=&quot;https://vllm.ai/blog/2026-06-05-v0.3-vllm-sr-themis-release&quot;&gt;vLLM Blog&lt;/a&gt;
vLLM Semantic Router v0.3 Themis ships Session-Aware Agentic Routing (SAAR) as a production-ready feature that locks multi-turn agent sessions to a specific model during active tool loops and provider-state continuations, resetting only at safe idle or drift boundaries. The release ranks #1 on RouterArena with a 75.4 weighted score at a $0.11/1K queries cost point, adds 18 new signal families (PII detection, jailbreak, complexity, embedding, etc.), introduces a canonical v0.3 YAML config replacing fragmented layouts, and extends hardware support to AMD ROCm and Intel OpenVINO alongside NVIDIA. &lt;strong&gt;Builder angle:&lt;/strong&gt; Builders running multi-turn agents can now delegate model-continuity logic to vLLM SR—SAAR prevents mid-session model switches during tool loops without custom routing code, while prefix-cache-aware switch pricing keeps costs visible.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DigitalOcean Inference Gateway ships prefix-aware routing live, with cached-token pricing coming soon&lt;/strong&gt; — &lt;a href=&quot;https://www.digitalocean.com/blog/reduce-llm-inference-costs-prefix-caching&quot;&gt;DigitalOcean Blog&lt;/a&gt;
DigitalOcean&apos;s Serverless Inference now routes requests to GPU instances already holding a shared system-prompt prefix in KV cache. At 1M daily requests where 70% share a common prefix, prefix-aware routing recovers ~34 GPU-hours/day; at 10M requests, ~340 GPU-hours/day—up to 4x effective compute cost reduction per request for prefix-heavy workloads. vLLM runtime optimizations on AMD Instinct MI325X and NVIDIA Hopper GPUs back the gains. Cached-token pricing (lower per-token cost on cache hits) is announced as launching on Serverless Inference within the next few weeks. &lt;strong&gt;Builder angle:&lt;/strong&gt; Builders with high shared-system-prompt traffic on DigitalOcean Serverless Inference get immediate cache-hit routing; upcoming cached-token pricing will translate cache hits into direct per-token cost savings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DeepSeek makes V4 Pro 75% discount permanent, undercutting GPT-5 and Claude Opus at $0.87/M output tokens&lt;/strong&gt; — &lt;a href=&quot;https://thenextweb.com/news/deepseek-v4-pro-75-percent-price-cut-permanent&quot;&gt;The Next Web&lt;/a&gt;
DeepSeek locked its promotional 75% price cut on V4 Pro permanently on May 24, after initially scheduling it to expire May 31. New rates: $0.003625 input / $0.87 output per million tokens (down from $0.0145–$3.48). At these rates, V4 Pro with 1M-token context undercuts OpenAI GPT-5 ($2.50/$10 per M), Anthropic Claude Opus 4.7 ($5/$25), and Google Gemini 3.5 Flash ($0.15/$0.60 output). Cache-hit input pricing can drop further to $0.0036/M. &lt;strong&gt;Builder angle:&lt;/strong&gt; The permanent cut makes DeepSeek V4 Pro a durable low-cost tier in inference routing tables—builders targeting sub-$1/M output with long-context (1M tokens) and frontier-class reasoning now have a persistent option rather than a promotional window.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI updates GPT-Rosalind with GPT-5.5 tool use and 31% token efficiency gains on life-science benchmarks&lt;/strong&gt; — &lt;a href=&quot;https://openai.com/index/introducing-new-capabilities-to-gpt-rosalind&quot;&gt;source&lt;/a&gt; — Domain-specific life-sciences model update; 31% fewer tokens than GPT-5.5 on GeneBench with higher accuracy—cost signal for builders in biotech/pharma verticals but no general API pricing change.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-platform</category><category>routing</category><category>vllm</category><category>agentic</category><category>saar</category><category>open-source</category><category>latency</category></item><item><title>Builder Loop — June 7, 2026</title><link>https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-07/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-07/</guid><description>Fix with Copilot for failing Actions now in Pro, Pro+, and Max; MAI-Code-1-Flash is now available for GitHub Copilot in VS Code; JetBrains Mellum2 goes …</description><pubDate>Sun, 07 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Coding agents are bulldozers, not replacements — humans still frame the problem. OSS shifts default stack choices faster than any vendor roadmap. When everyone can generate and fork, differentiation is taste, review, and what you ship before it becomes the default.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fix with Copilot for failing Actions now in Pro, Pro+, and Max&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-04-fix-with-copilot-for-failing-actions-now-in-pro-pro-and-max&quot;&gt;GitHub Changelog&lt;/a&gt;
When a GitHub Actions job fails, a &apos;Fix with Copilot&apos; button on the workflow log triggers a cloud agent that investigates the failure, implements a fix, pushes it to the branch, and notifies the developer for review—no manual triage required. Available on Copilot Pro, Pro+, and Max. &lt;strong&gt;Builder angle:&lt;/strong&gt; Builders on GitHub Actions can now hand CI failure triage and patching to a cloud agent, removing the context-switch to debug broken runs before resuming feature work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;MAI-Code-1-Flash is now available for GitHub Copilot in VS Code&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-02-mai-code-1-flash-is-now-available-for-github-copilot/&quot;&gt;GitHub Changelog&lt;/a&gt;
Microsoft&apos;s in-house small-tier coding model, MAI-Code-1-Flash, is rolling out in GitHub Copilot via the VS Code model picker. Outperforms comparable small models in early testing and is available across all Copilot tiers (Free through Max), starting with a limited rollout expanding over weeks. &lt;strong&gt;Builder angle:&lt;/strong&gt; VS Code Copilot users now have a Microsoft-optimized, low-latency coding model selectable from the model picker—a cheaper inference option for routine completions without leaving the IDE.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;JetBrains Mellum2 goes open source: 12B MoE model for coding agent routing and sub-agents&lt;/strong&gt; — &lt;a href=&quot;https://blog.jetbrains.com/ai/2026/06/mellum2-goes-open-source-a-fast-model-for-ai-workflows/&quot;&gt;JetBrains Blog&lt;/a&gt;
JetBrains released Mellum2, a 12B-parameter Mixture-of-Experts model (2.5B active per token) under Apache 2.0 on Hugging Face. Designed for routing, low-latency RAG, and sub-agent orchestration in coding pipelines—inference time reportedly less than half of similar-sized dense models. Supports private, local deployment. &lt;strong&gt;Builder angle:&lt;/strong&gt; Teams building coding agent pipelines or IDE integrations can self-host a JetBrains-tuned model optimized for context-gathering, planning, and validation steps at a fraction of frontier model inference cost.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenCV 5.0 ships rewritten DNN engine with built-in LLM/VLM inference and 80%+ ONNX coverage&lt;/strong&gt; — &lt;a href=&quot;https://github.com/opencv/opencv/releases/tag/5.0.0&quot;&gt;source&lt;/a&gt; — CV pipelines can now run VLM inference (image→text) in-process via OpenCV without a separate LLM runtime, enabling tighter perception–language integration in agent and robotics deployments.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hugging Face redesigns &lt;code&gt;hf&lt;/code&gt; CLI as agent-first tool: dual-mode output, next-command hints, 94% task success in Claude Code&lt;/strong&gt; — &lt;a href=&quot;https://huggingface.co/blog/hf-cli-for-agents&quot;&gt;source&lt;/a&gt; — Agent harnesses can call Hub operations (model download, Space deploy, dataset push) via structured CLI output with measurably higher success rates and lower token cost than SDK or curl wrappers.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>builder-loop</category><category>github-copilot</category><category>ci-cd</category><category>cloud-agent</category><category>agentic-workflow</category><category>vs-code</category><category>model</category></item><item><title>Agent Commerce — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-commerce/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-commerce/2026-06-06/</guid><description>Stripe expands Shared Payment Tokens to Mastercard Agent Pay, Visa Intelligent Commerce, Affirm, and Klarna; Stripe adds SharedPaymentIssuedToken API fo…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Agents that can pay change commerce architecture. Protocol moves (x402, SPT, checkout rails) are infrastructure — the moat is trust, scoped authorization, and human-defined intent boundaries.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stripe expands Shared Payment Tokens to Mastercard Agent Pay, Visa Intelligent Commerce, Affirm, and Klarna&lt;/strong&gt; — &lt;a href=&quot;https://stripe.com/blog/supporting-additional-payment-methods-for-agentic-commerce&quot;&gt;Stripe Blog&lt;/a&gt;
Stripe is rolling out broader Shared Payment Token (SPT) support for agentic commerce. When a customer authorizes an agent, Stripe provisions Mastercard Agent Pay or Visa Intelligent Commerce network tokens scoped to intent and shares them with the agent; the agent can reuse vaulted tokens across any seller accepting agentic payments. Sellers interact only with SPTs while Stripe provisions network and BNPL tokens behind the scenes. Affirm and Klarna BNPL flows surface confirmation in the agent UI and pass seller credentials to the BNPL provider. Etsy and URBN merchants already use SPTs in production. &lt;strong&gt;Builder angle:&lt;/strong&gt; Agents can authorize scoped card and BNPL spend once via SPTs and settle across multiple merchants without exposing PANs or re-running checkout per seller.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stripe adds SharedPaymentIssuedToken API for agent-issued scoped payment grants&lt;/strong&gt; — &lt;a href=&quot;https://docs.stripe.com/changelog/dahlia/2026-04-22/shared-payment-issued-token.md&quot;&gt;Stripe Documentation&lt;/a&gt;
API version 2026-04-22.preview adds SharedPaymentIssuedToken: agents create scoped grants via POST /v1/shared_payment/issued_tokens (payment_method, seller_details.network_business_profile, usage_limits for currency, max_amount, expires_at), retrieve status with GET, and revoke with POST .../revoke. Tokens include network_business_profile for seller context and validate against Orchestrated Commerce Agreements for delegated payment relationships. &lt;strong&gt;Builder angle:&lt;/strong&gt; Agent platforms get explicit create/retrieve/revoke endpoints to issue time- and amount-bounded payment grants tied to a seller profile before handing tokens to merchants.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cloudflare Agents SDK documents x402 and MPP payment flows with MCP server hooks&lt;/strong&gt; — &lt;a href=&quot;https://developers.cloudflare.com/agents/agentic-payments/&quot;&gt;Cloudflare Developers&lt;/a&gt;
Cloudflare&apos;s Agents SDK supports HTTP 402 agentic payments via x402 (Coinbase stablecoin rail with PAYMENT-REQUIRED, PAYMENT-SIGNATURE, PAYMENT-RESPONSE headers and facilitator settlement) and Machine Payments Protocol (MPP, co-authored by Tempo and Stripe, using WWW-Authenticate: Payment / Authorization: Payment on the IETF track). Server-side MCP integration includes withX402 and paidTool; client-side withX402Client auto-handles 402 retries. MPP accepts cards via Stripe SPTs, stablecoins, and Lightning; MPP clients are backwards-compatible with existing x402 services. &lt;strong&gt;Builder angle:&lt;/strong&gt; MCP tool servers can gate paid tools behind 402 challenges and let agents settle via x402 stablecoins or MPP card/SPT rails without accounts or API keys.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-commerce</category><category>spt</category><category>acp</category><category>network-tokens</category><category>bnpl</category><category>stripe</category><category>api</category></item><item><title>Agent Reliability — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-06/</guid><description>Microsoft Foundry IQ knowledge bases lift evidence recall up to 54% with agentic retrieval tiers; Cohesity Gaia patents embedding-based RAG over backup …</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;You cannot run what you cannot see. Grounding is institutional context encoded in retrieval; observability is electricity-metering for agent loops; security moves from policy decks to runtime guardrails. Reliability is the stack between &quot;it works in demo&quot; and &quot;it runs in prod.&quot;&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft Foundry IQ knowledge bases lift evidence recall up to 54% with agentic retrieval tiers&lt;/strong&gt; — &lt;a href=&quot;https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/foundry-iq-improve-recall-by-up-to-54-with-knowledge-bases/4524852&quot;&gt;Microsoft Foundry Blog&lt;/a&gt;
Foundry IQ replaces static single-shot RAG with a dynamic agentic retrieval loop that batches and customizes subqueries per knowledge source, retrained semantic ranker, and retrievalReasoningEffort tiers (minimal, low, medium). On BrowseComp-Plus, knowledge bases beat standalone hybrid search by up to 46% evidence recall; pairing a smaller orchestrator model with agentic retrieval reaches 54% while cutting tool calls and token cost ~34%. Medium tier adds up to two iterative retrieval turns; heterogeneous sources (MCP, Fabric ontology, SQL) combine structured and unstructured recall. &lt;strong&gt;Builder angle:&lt;/strong&gt; retrievalReasoningEffort gives one knob to trade latency and token cost against recall instead of hand-building multi-query RAG loops.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cohesity Gaia patents embedding-based RAG over backup data without copying secondary stores&lt;/strong&gt; — &lt;a href=&quot;https://www.cohesity.com/newsroom/press/cohesity-secures-patent-gen-ai-retrieval-augmented-generation-secondary-data/&quot;&gt;Cohesity Newsroom&lt;/a&gt;
USPTO granted Patent 12,619,501 (May 5, 2026) for &quot;Data Retrieval Using Embeddings for Data in Backup Systems,&quot; covering Gaia&apos;s method of indexing embeddings on secondary/backup data in place. Gaia is available on Cohesity Data Cloud and lets GenAI search protected enterprise archives while preserving existing security, governance, and access controls—no separate data copy for AI indexing. &lt;strong&gt;Builder angle:&lt;/strong&gt; Indexes cold backup tiers in situ for RAG, a pattern for teams blocked from exporting archives into a standalone vector DB.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Elastic Agent Builder GA ships five-line RAG grounding via GitHub Copilot SDK bridge&lt;/strong&gt; — &lt;a href=&quot;https://www.elastic.co/search-labs/blog/rag-agent-elasticsearch-github-copilot-sdk&quot;&gt;Elasticsearch Labs&lt;/a&gt;
Elastic Agent Builder is GA and connects to the GitHub Copilot SDK through Elastic.Extensions.AI, registering Elasticsearch hybrid retrieval as a native Copilot tool in roughly five lines of C#. Copilot handles planning and orchestration; Elasticsearch returns logs, docs, and proprietary records. Supports RAG/hybrid search grounding, MCP/A2A interoperability with prebuilt Elastic agents, and optional Elastic Inference Service models. &lt;strong&gt;Builder angle:&lt;/strong&gt; Minimal bridge code wires production hybrid search into an orchestrator instead of building a custom retrieval tool layer.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Microsoft Foundry extends tracing and evals to any agent framework at Build 2026&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/foundry/build-2026-from-observability-to-roi-for-ai-agents-on-any-framework/&quot;&gt;source&lt;/a&gt; — Point your existing OTel exporter at Foundry to get multi-turn evals, rubric scoring, and production trace sampling without swapping orchestration frameworks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Amazon Bedrock AgentCore ships Lambda code-based evaluators for CI gates and online monitoring&lt;/strong&gt; — &lt;a href=&quot;https://aws.amazon.com/blogs/machine-learning/build-custom-code-based-evaluators-in-amazon-bedrock-agentcore/&quot;&gt;source&lt;/a&gt; — Encode deterministic agent contracts—tool schemas, workflow order, PII rules—as Lambda evaluators that block deploys in CI and alarm in production on the same evaluator ID.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-reliability</category><category>agentic-retrieval</category><category>semantic-rerank</category><category>hybrid-search</category><category>microsoft</category><category>embeddings</category><category>knowledge-ingestion</category></item><item><title>Agent Security — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-security/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-security/2026-06-06/</guid><description>Microsoft MXC SDK enforces policy-driven agent containment on Windows and WSL; Microsoft documents Claude Code GitHub Action secret exfiltration via Rea…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Cheaper agents mean more attack surface. Security moves from policy decks to runtime guardrails — humans still define what must never happen; the harness enforces it.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft MXC SDK enforces policy-driven agent containment on Windows and WSL&lt;/strong&gt; — &lt;a href=&quot;https://blogs.windows.com/windowsdeveloper/2026/06/02/windows-platform-security-for-ai-agents/&quot;&gt;Windows Developer Blog&lt;/a&gt;
At Build 2026 Microsoft previewed the Microsoft Execution Containers (MXC) SDK, a cross-platform policy layer that maps developer-defined constraints to isolation primitives at runtime. Early preview ships process isolation (GitHub Copilot CLI adopted it for model-generated code) and session isolation with distinct Entra-backed agent identities; Agent 365 plus Intune/Entra apply per-agent policy. Roadmap adds micro-VM and WSL Linux containers, and Windows Defender scans for prompt injection on the endpoint. &lt;strong&gt;Builder angle:&lt;/strong&gt; Agent harnesses can delegate filesystem and network bounds to MXC instead of inheriting the full user session, with IT pushing the same policy model through Entra and Intune.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft documents Claude Code GitHub Action secret exfiltration via Read tool bypass&lt;/strong&gt; — &lt;a href=&quot;https://www.microsoft.com/en-us/security/blog/2026/06/05/securing-ci-cd-in-agentic-world-claude-code-github-action-case/&quot;&gt;Microsoft Security Blog&lt;/a&gt;
Microsoft Threat Intelligence found prompt injection in GitHub issue/PR content could steer Claude Code Action to read /proc/self/environ via the in-process Read tool, bypassing Bubblewrap env scrubbing used for Bash and leaking ANTHROPIC_API_KEY. Anthropic patched in Claude Code 2.1.128 by blocking sensitive /proc paths. Microsoft recommends the Agents Rule of Two: never combine untrusted input, secret/tool access, and external write channels in one workflow. &lt;strong&gt;Builder angle:&lt;/strong&gt; CI agents that ingest repo issues must route all file reads through the same scrubbed subprocess boundary as shell tools, and split triage workflows from token-bearing tag-mode runs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft AI Red Team publishes agentic failure-mode taxonomy v2.0 with seven new categories&lt;/strong&gt; — &lt;a href=&quot;https://www.microsoft.com/en-us/security/blog/2026/06/04/updating-taxonomy-failure-modes-agentic-ai-systems-year-red-teaming-taught-us/&quot;&gt;Microsoft Security Blog&lt;/a&gt;
After 12 months of red-team engagements Microsoft updated its agentic AI failure-mode taxonomy with seven new categories: agentic supply chain compromise, goal hijacking, inter-agent trust escalation, computer-use visual attacks, session context contamination, MCP/plugin abuse, and capability disclosure. Operational data shows HitL bypass and XPIA-plus-memory-poisoning chains at high frequency. New mitigations prescribe agent SBOMs including MCP tool descriptions, cryptographic inter-agent identity, consent-architecture hardening, and adversarial session context tracking. &lt;strong&gt;Builder angle:&lt;/strong&gt; Use the v2.0 matrix as a red-team checklist—especially MCP tool-description poisoning, session contamination, and capability disclosure before shipping production agents.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cisco AI Defense adds adaptive red teaming and Policy Studio natural-language guardrails&lt;/strong&gt; — &lt;a href=&quot;https://blogs.cisco.com/ai/cisco-ai-defense-gets-personal-agent-security&quot;&gt;source&lt;/a&gt; — Per-agent adaptive red-team objectives and NL Policy Studio guardrails; CI/CD CLI discovers agent dependency graphs including MCP servers and skills.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SafeMCP open-source plugin filters hazardous MCP tools via look-ahead world model&lt;/strong&gt; — &lt;a href=&quot;https://arxiv.org/html/2606.01991v1&quot;&gt;source&lt;/a&gt; — BAAI/PKU server-side MCP defense proactively prunes tool sets and fail-safe blocks unsafe calls; code at github.com/wlc2424762917/SafeMCP.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-security</category><category>runtime-containment</category><category>policy-enforcement</category><category>windows</category><category>agent-identity</category><category>prompt-injection</category><category>ci-cd</category></item><item><title>Agent Stack — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agent-stack/2026-06-06/</guid><description>Microsoft Agent Framework ships Agent Harness, CodeAct, and Handoff orchestration at BUILD 2026; NVIDIA releases NemoClaw orchestration blueprints and O…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;The harness, tool surface, and delegation topology are commoditizing together. When every vendor ships MCP and orchestration, the moat is how humans wire judgment, guardrails, and institutional context into the agent loop — not whether agents can run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft Agent Framework ships Agent Harness, CodeAct, and Handoff orchestration at BUILD 2026&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/agent-framework/microsoft-agent-framework-at-build-2026-announce/&quot;&gt;Microsoft Agent Framework Blog&lt;/a&gt;
MAF (1.0 GA April 2026) adds Agent Harness via AsHarnessAgent() with automatic context compaction, filesystem memory, todo tracking, plan/execute modes, AgentSkillsProvider, BackgroundAgentsProvider, shell execution (.NET), ToolApprovalAgent, and OpenTelemetry. Foundry Hosted Agents deploy local MAF agents as containers with scale-to-zero, per-session VM isolation, and persistent filesystem. CodeAct (alpha) runs multi-tool Python in Hyperlight micro-VMs, cutting benchmark latency 52% and tokens 64%. HandoffBuilder adds directed multi-agent routing with developer-defined topology and guardrails. &lt;strong&gt;Builder angle:&lt;/strong&gt; One method turns a chat client into a production harness with compaction, skills, sub-agents, and hosted deployment — collapsing what teams typically stitch from separate OSS pieces.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA releases NemoClaw orchestration blueprints and OpenShell secure agent runtime&lt;/strong&gt; — &lt;a href=&quot;https://nvidianews.nvidia.com/news/enterprise-software-leaders-build-ai-agents-with-nvidia&quot;&gt;NVIDIA Newsroom&lt;/a&gt;
NVIDIA Agent Toolkit ships NemoClaw blueprints (available now) connecting popular harnesses for long-running agents, plus OpenShell early preview for policy/privacy controls and routing queries to local vs cloud models. Nemotron 3 Ultra (550B MoE) targets agent harnesses including LangChain Deep Agents, OpenHands, and OpenCode. CUDA-X libraries (cuDF, cuOpt, NeMo, PhysicsNeMo, CUDA-Q) are exposed as domain-specific agent skills. Microsoft partners on Windows security primitives plus OpenShell; Canonical and Red Hat integrate OpenShell into Ubuntu and Red Hat AI. &lt;strong&gt;Builder angle:&lt;/strong&gt; NemoClaw plugs orchestration blueprints into existing harnesses while OpenShell adds a policy-controlled runtime layer for cross-platform agent deployment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Amazon Agent-Ops multi-agent framework automates SOPs with 85–97% accuracy in production&lt;/strong&gt; — &lt;a href=&quot;https://www.amazon.science/publications/agent-ops-a-multi-agent-orchestration-framework-for-end-to-end-sop-automation-in-e-commerce-operations&quot;&gt;Amazon Science&lt;/a&gt;
Agent-Ops orchestrates three agents for e-commerce SOP automation: SOP Groomer converts ambiguous docs into automation-ready specs, WebAgent hits 91.3% task completion via demonstration-based learning on dynamic web UIs, and Document Verification Agent validates invoices and certificates at 94.2% accuracy across languages. Deployed across seven SOP categories in three regions with 83% case-resolution time reduction, used by 100 account managers. &lt;strong&gt;Builder angle:&lt;/strong&gt; Demonstrates a supervisor plus specialized-worker orchestration pattern with measurable production accuracy on incomplete SOPs and unpredictable UIs — not just lab demos.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MCP 2026-07-28 release candidate makes Streamable HTTP stateless&lt;/strong&gt; — &lt;a href=&quot;https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/&quot;&gt;source&lt;/a&gt; — Agents behind gateways can drop sticky sessions and route MCP calls on HTTP headers instead of parsing JSON-RPC bodies.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AWS documents OAuth code flow for AgentCore Gateway MCP inbound auth&lt;/strong&gt; — &lt;a href=&quot;https://aws.amazon.com/blogs/machine-learning/building-a-secure-auth-code-flow-setup-using-agentcore-gateway-with-mcp-clients/&quot;&gt;source&lt;/a&gt; — Production MCP gateways can enforce per-user IdP tokens at the routing layer before any tool invocation reaches backend servers.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agent-stack</category><category>agent-harness</category><category>orchestration</category><category>microsoft</category><category>codeact</category><category>harness</category><category>runtime</category></item><item><title>Agents &amp; Harness — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/agents-harness/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/agents-harness/2026-06-06/</guid><description>Microsoft Agent Framework ships Agent Harness, CodeAct, and Handoff orchestration at BUILD 2026; NVIDIA releases NemoClaw orchestration blueprints and O…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;The harness is commoditizing — compaction, skills, hosted deploy, and orchestration primitives are the new electricity. When every vendor ships a harness, the moat is how humans wire judgment, guardrails, and institutional context into the loop.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft Agent Framework ships Agent Harness, CodeAct, and Handoff orchestration at BUILD 2026&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/agent-framework/microsoft-agent-framework-at-build-2026-announce/&quot;&gt;Microsoft Agent Framework Blog&lt;/a&gt;
MAF (1.0 GA April 2026) adds Agent Harness via AsHarnessAgent() with automatic context compaction, filesystem memory, todo tracking, plan/execute modes, AgentSkillsProvider, BackgroundAgentsProvider, shell execution (.NET), ToolApprovalAgent, and OpenTelemetry. Foundry Hosted Agents deploy local MAF agents as containers with scale-to-zero, per-session VM isolation, and persistent filesystem. CodeAct (alpha) runs multi-tool Python in Hyperlight micro-VMs, cutting benchmark latency 52% and tokens 64%. HandoffBuilder adds directed multi-agent routing with developer-defined topology and guardrails. &lt;strong&gt;Builder angle:&lt;/strong&gt; One method turns a chat client into a production harness with compaction, skills, sub-agents, and hosted deployment — collapsing what teams typically stitch from separate OSS pieces.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA releases NemoClaw orchestration blueprints and OpenShell secure agent runtime&lt;/strong&gt; — &lt;a href=&quot;https://nvidianews.nvidia.com/news/enterprise-software-leaders-build-ai-agents-with-nvidia&quot;&gt;NVIDIA Newsroom&lt;/a&gt;
NVIDIA Agent Toolkit ships NemoClaw blueprints (available now) connecting popular harnesses for long-running agents, plus OpenShell early preview for policy/privacy controls and routing queries to local vs cloud models. Nemotron 3 Ultra (550B MoE) targets agent harnesses including LangChain Deep Agents, OpenHands, and OpenCode. CUDA-X libraries (cuDF, cuOpt, NeMo, PhysicsNeMo, CUDA-Q) are exposed as domain-specific agent skills. Microsoft partners on Windows security primitives plus OpenShell; Canonical and Red Hat integrate OpenShell into Ubuntu and Red Hat AI. &lt;strong&gt;Builder angle:&lt;/strong&gt; NemoClaw plugs orchestration blueprints into existing harnesses while OpenShell adds a policy-controlled runtime layer for cross-platform agent deployment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Amazon Agent-Ops multi-agent framework automates SOPs with 85–97% accuracy in production&lt;/strong&gt; — &lt;a href=&quot;https://www.amazon.science/publications/agent-ops-a-multi-agent-orchestration-framework-for-end-to-end-sop-automation-in-e-commerce-operations&quot;&gt;Amazon Science&lt;/a&gt;
Agent-Ops orchestrates three agents for e-commerce SOP automation: SOP Groomer converts ambiguous docs into automation-ready specs, WebAgent hits 91.3% task completion via demonstration-based learning on dynamic web UIs, and Document Verification Agent validates invoices and certificates at 94.2% accuracy across languages. Deployed across seven SOP categories in three regions with 83% case-resolution time reduction, used by 100 account managers. &lt;strong&gt;Builder angle:&lt;/strong&gt; Demonstrates a supervisor plus specialized-worker orchestration pattern with measurable production accuracy on incomplete SOPs and unpredictable UIs — not just lab demos.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>agents-harness</category><category>agent-harness</category><category>orchestration</category><category>microsoft</category><category>codeact</category><category>harness</category><category>runtime</category></item><item><title>AI Coding — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/ai-coding/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/ai-coding/2026-06-06/</guid><description>Intelligent Terminal 0.1 ships ACP-native agent pane in Windows shell; Cursor SDK adds custom tools, nested subagents, and headless auto-review; GitHub …</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Coding agents are bulldozers, not replacements — humans still frame the problem. When everyone has background agents and million-token context, differentiation shifts to taste, review, and what you ship, not whether you can generate code.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Intelligent Terminal 0.1 ships ACP-native agent pane in Windows shell&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/commandline/announcing-intelligent-terminal-version-0-1/&quot;&gt;Windows Command Line Blog&lt;/a&gt;
Microsoft released Intelligent Terminal 0.1, an experimental Windows Terminal fork with a docked agent pane connected via Agent Client Protocol (ACP). GitHub Copilot CLI is the default agent, but any ACP-compatible CLI is configurable. Failed commands trigger auto-detected errors in the status bar; clicking loads shell output into the agent pane for explanation and fixes. Background agent tasks run in new tabs, and Command Palette prompt mode (?query) injects active-pane context without blocking the shell. &lt;strong&gt;Builder angle:&lt;/strong&gt; Terminal-first developers can delegate multi-step shell fixes to any ACP agent without leaving the command line or manually copying error output.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cursor SDK adds custom tools, nested subagents, and headless auto-review&lt;/strong&gt; — &lt;a href=&quot;https://cursor.com/changelog&quot;&gt;Cursor Changelog&lt;/a&gt;
Cursor&apos;s June 4, 2026 SDK release lets local agents register custom tools via function definitions exposed through a built-in custom-user-tools MCP server, visible to all nested subagents. Subagents can spawn subagents to arbitrary depth. Headless runs can set local.autoReview to route tool calls through a classifier steered by permissions.json allow/block instructions. Persistence options expand beyond SQLite to JSONL and custom LocalAgentStore implementations; each send() carries a platform requestId for CI correlation. &lt;strong&gt;Builder angle:&lt;/strong&gt; CI and internal scripts can embed coding agents with first-class custom tools and graded auto-approval instead of standing up separate MCP servers or interactive review loops.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Agent tasks REST API exposes programmatic Copilot cloud agent runs&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-04-agent-tasks-rest-api-now-available-for-copilot-pro-pro-and-max/&quot;&gt;GitHub Changelog&lt;/a&gt;
Copilot Pro, Pro+, and Max users can start and track Copilot cloud agent tasks via a public-preview REST API authenticated with PATs or OAuth tokens. Cloud agents run in an isolated development environment, make and validate code changes, and open pull requests. Documented use cases include fan-out refactors across repositories, one-click repo scaffolding from internal portals, and scheduled release preparation with release notes. &lt;strong&gt;Builder angle:&lt;/strong&gt; Background coding agents can be triggered from scripts, portals, or schedulers instead of only from IDE or Copilot app sessions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-coding</category><category>acp</category><category>cli</category><category>windows-terminal</category><category>copilot-cli</category><category>cursor-sdk</category><category>subagents</category></item><item><title>AI Platform — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/ai-platform/2026-06-06/</guid><description>DigitalOcean Inference Gateway ships prefix-aware routing with 75%+ cache hit rates; GitHub Copilot switches all plans to usage-based AI Credits billing…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Token price is the new kWh, and the platform you ship on determines how fast you reach production. Jevons says falling inference cost drives more loops and heavier agents — track pricing, routing, and ship infrastructure moves that change what builders can afford to run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DigitalOcean Inference Gateway ships prefix-aware routing with 75%+ cache hit rates&lt;/strong&gt; — &lt;a href=&quot;https://www.digitalocean.com/blog/reduce-llm-inference-costs-prefix-caching&quot;&gt;DigitalOcean Blog&lt;/a&gt;
DigitalOcean&apos;s Inference Gateway (June 2, 2026) routes requests to vLLM pods most likely to hold matching KV-cache prefix blocks, using sha256_cbor_64bit block hashes and combined prefix-cache plus GPU-utilization scorers. On shared-prefix workloads, cache hit rates rise from roughly 25% under round-robin to 75%+, cutting effective compute cost by up to 4x on identical hardware; prefix caching with cached-token pricing is rolling out to Serverless Inference in coming weeks. &lt;strong&gt;Builder angle:&lt;/strong&gt; Multi-replica inference fleets can cut redundant prefill spend by routing to cache-warm pods instead of adding GPUs—especially for agent loops with fixed system prompts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Copilot switches all plans to usage-based AI Credits billing&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-01-updates-to-github-copilot-billing-and-plans/&quot;&gt;GitHub Changelog&lt;/a&gt;
As of June 1, 2026, all Copilot plans bill by GitHub AI Credits consumed (each credit equals $0.01 of value) instead of premium request units. Included monthly allowances are 1,500 credits on Pro ($10), 7,000 on Pro+ ($39), and 20,000 on Max ($100); overages require an additional spending budget. Copilot code review now also consumes GitHub Actions minutes alongside AI Credits. &lt;strong&gt;Builder angle:&lt;/strong&gt; Copilot cost is now token-metered like API inference—agentic and review-heavy workflows need credit budgets and plan-tier math before defaulting to premium models.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DeepSeek makes V4 Pro 75% API price cut permanent at $0.87 per million output tokens&lt;/strong&gt; — &lt;a href=&quot;https://thenextweb.com/news/deepseek-v4-pro-75-percent-price-cut-permanent&quot;&gt;The Next Web&lt;/a&gt;
DeepSeek locked in a promotional 75% discount on V4 Pro API pricing after a May 31 expiry date, setting permanent rates from $0.003625 to $0.87 per million tokens (down from $0.0145 to $3.48). The model supports a 1M-token context window at the lower price, undercutting GPT-5, Claude Opus 4.7, and Gemini Flash tiers on per-token output cost. &lt;strong&gt;Builder angle:&lt;/strong&gt; Long-context and high-volume workloads have a materially cheaper frontier-tier option—builders should model routing simple tasks to DeepSeek while weighing compliance and latency tradeoffs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Vercel Sandbox Drives add persistent attachable storage for agent workspaces&lt;/strong&gt; — &lt;a href=&quot;https://vercel.com/changelog&quot;&gt;source&lt;/a&gt; — Agent sandboxes can retain cloned repos, dependencies, and build artifacts across disposable runs instead of cold-starting every session.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;skills.sh API launches with Vercel OIDC auth for querying 600k+ open-source skills&lt;/strong&gt; — &lt;a href=&quot;https://vercel.com/changelog/the-skills-sh-api-is-now-available&quot;&gt;source&lt;/a&gt; — Deployed apps on Vercel can discover and audit agent skills programmatically without storing long-lived API secrets.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>ai-platform</category><category>prefix-caching</category><category>routing</category><category>vllm</category><category>cost-optimization</category><category>pricing</category><category>github-copilot</category></item><item><title>Builder Loop — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/builder-loop/2026-06-06/</guid><description>Intelligent Terminal 0.1 ships ACP-native agent pane in Windows shell; Cursor SDK adds custom tools, nested subagents, and headless auto-review; GitHub …</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Coding agents are bulldozers, not replacements — humans still frame the problem. OSS shifts default stack choices faster than any vendor roadmap. When everyone can generate and fork, differentiation is taste, review, and what you ship before it becomes the default.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Intelligent Terminal 0.1 ships ACP-native agent pane in Windows shell&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/commandline/announcing-intelligent-terminal-version-0-1/&quot;&gt;Windows Command Line Blog&lt;/a&gt;
Microsoft released Intelligent Terminal 0.1, an experimental Windows Terminal fork with a docked agent pane connected via Agent Client Protocol (ACP). GitHub Copilot CLI is the default agent, but any ACP-compatible CLI is configurable. Failed commands trigger auto-detected errors in the status bar; clicking loads shell output into the agent pane for explanation and fixes. Background agent tasks run in new tabs, and Command Palette prompt mode (?query) injects active-pane context without blocking the shell. &lt;strong&gt;Builder angle:&lt;/strong&gt; Terminal-first developers can delegate multi-step shell fixes to any ACP agent without leaving the command line or manually copying error output.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cursor SDK adds custom tools, nested subagents, and headless auto-review&lt;/strong&gt; — &lt;a href=&quot;https://cursor.com/changelog&quot;&gt;Cursor Changelog&lt;/a&gt;
Cursor&apos;s June 4, 2026 SDK release lets local agents register custom tools via function definitions exposed through a built-in custom-user-tools MCP server, visible to all nested subagents. Subagents can spawn subagents to arbitrary depth. Headless runs can set local.autoReview to route tool calls through a classifier steered by permissions.json allow/block instructions. Persistence options expand beyond SQLite to JSONL and custom LocalAgentStore implementations; each send() carries a platform requestId for CI correlation. &lt;strong&gt;Builder angle:&lt;/strong&gt; CI and internal scripts can embed coding agents with first-class custom tools and graded auto-approval instead of standing up separate MCP servers or interactive review loops.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Agent tasks REST API exposes programmatic Copilot cloud agent runs&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-04-agent-tasks-rest-api-now-available-for-copilot-pro-pro-and-max/&quot;&gt;GitHub Changelog&lt;/a&gt;
Copilot Pro, Pro+, and Max users can start and track Copilot cloud agent tasks via a public-preview REST API authenticated with PATs or OAuth tokens. Cloud agents run in an isolated development environment, make and validate code changes, and open pull requests. Documented use cases include fan-out refactors across repositories, one-click repo scaffolding from internal portals, and scheduled release preparation with release notes. &lt;strong&gt;Builder angle:&lt;/strong&gt; Background coding agents can be triggered from scripts, portals, or schedulers instead of only from IDE or Copilot app sessions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;NVIDIA open-sources physical AI agent skills across Omniverse, Cosmos, Alpamayo, and Metropolis&lt;/strong&gt; — &lt;a href=&quot;https://nvidianews.nvidia.com/news/nvidia-releases-major-collection-of-open-source-agent-tools-and-skills-for-physical-ai&quot;&gt;source&lt;/a&gt; — Agent builders working on embodied or simulation-heavy workflows can pull verified NVIDIA skills into existing harnesses instead of wiring CUDA-X libraries by hand.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;vLLM Semantic Router adds Session-Aware Agentic Routing with prefix-cache switch pricing&lt;/strong&gt; — &lt;a href=&quot;https://vllm.ai/blog/2026-06-02-session-aware-agentic-routing&quot;&gt;source&lt;/a&gt; — Self-hosted agent stacks using vLLM auto routing can keep multi-turn tool sessions stable without silently breaking provider continuation state or wasting prefix-cache locality.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>builder-loop</category><category>acp</category><category>cli</category><category>windows-terminal</category><category>copilot-cli</category><category>cursor-sdk</category><category>subagents</category></item><item><title>Builder Tooling — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/builder-tooling/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/builder-tooling/2026-06-06/</guid><description>Vercel Sandbox Drives add persistent attachable storage for agent workspaces; skills.sh API launches with Vercel OIDC auth for querying 600k+ open-sourc…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Ship infrastructure is the layer above commodity models. Sandboxes, deploy surfaces, and CI that agents can drive expand the pie — more builders ship more often when activation energy drops.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Vercel Sandbox Drives add persistent attachable storage for agent workspaces&lt;/strong&gt; — &lt;a href=&quot;https://vercel.com/changelog&quot;&gt;Vercel Changelog&lt;/a&gt;
Vercel Sandbox entered private beta for Drives on June 5, 2026. Drives are persistent storage volumes with a lifecycle independent of any sandbox: create once, mount at a configurable path (e.g. /workspace) when starting a sandbox, and reattach after the sandbox stops. Requires @vercel/sandbox@beta or sandbox@beta CLI. During beta, each drive allows read-write mount by only one sandbox at a time; Vercel advises against production data use. &lt;strong&gt;Builder angle:&lt;/strong&gt; Agent sandboxes can retain cloned repos, dependencies, and build artifacts across disposable runs instead of cold-starting every session.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;skills.sh API launches with Vercel OIDC auth for querying 600k+ open-source skills&lt;/strong&gt; — &lt;a href=&quot;https://vercel.com/changelog/the-skills-sh-api-is-now-available&quot;&gt;Vercel Changelog&lt;/a&gt;
On June 5, 2026, Vercel published the skills.sh REST API. Clients authenticate with a short-lived Vercel OIDC token from getVercelOidcToken() in @vercel/oidc, scoped to team and project with automatic rotation. Endpoints support searching skills, fetching details, and checking security audits across 600,000+ skills. Rate limit is 600 requests per minute per team and project. &lt;strong&gt;Builder angle:&lt;/strong&gt; Deployed apps on Vercel can discover and audit agent skills programmatically without storing long-lived API secrets.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Actions forces JavaScript actions onto Node.js 24 starting June 2&lt;/strong&gt; — &lt;a href=&quot;https://github.com/orgs/community/discussions/189236&quot;&gt;GitHub Community&lt;/a&gt;
GitHub began defaulting JavaScript-based Actions (e.g. actions/checkout@v4) to Node.js 24 on June 2, 2026, deprecating Node.js 20. Workflows showing deprecation warnings should update action versions to Node.js 24-compatible releases. Teams can opt in early with FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true or temporarily opt out via ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION. Node.js 20 is scheduled for full removal from runners in fall 2026. &lt;strong&gt;Builder angle:&lt;/strong&gt; CI pipelines shipping AI apps must bump marketplace action pins or set explicit Node 24 flags before workflows break on deprecated runtimes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>builder-tooling</category><category>vercel-sandbox</category><category>persistent-storage</category><category>agent-workspace</category><category>private-beta</category><category>vercel</category><category>skills-api</category></item><item><title>Inference Economics — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/inference-economics/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/inference-economics/2026-06-06/</guid><description>DigitalOcean Inference Gateway ships prefix-aware routing with 75%+ cache hit rates; GitHub Copilot switches all plans to usage-based AI Credits billing…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Token price is the new kWh. Jevons says falling cost drives more loops, deeper reasoning, and heavier agents — not smaller bills. Track pricing, routing, and caching moves that change what builders can afford to run.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DigitalOcean Inference Gateway ships prefix-aware routing with 75%+ cache hit rates&lt;/strong&gt; — &lt;a href=&quot;https://www.digitalocean.com/blog/reduce-llm-inference-costs-prefix-caching&quot;&gt;DigitalOcean Blog&lt;/a&gt;
DigitalOcean&apos;s Inference Gateway (June 2, 2026) routes requests to vLLM pods most likely to hold matching KV-cache prefix blocks, using sha256_cbor_64bit block hashes and combined prefix-cache plus GPU-utilization scorers. On shared-prefix workloads, cache hit rates rise from roughly 25% under round-robin to 75%+, cutting effective compute cost by up to 4x on identical hardware; prefix caching with cached-token pricing is rolling out to Serverless Inference in coming weeks. &lt;strong&gt;Builder angle:&lt;/strong&gt; Multi-replica inference fleets can cut redundant prefill spend by routing to cache-warm pods instead of adding GPUs—especially for agent loops with fixed system prompts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GitHub Copilot switches all plans to usage-based AI Credits billing&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-01-updates-to-github-copilot-billing-and-plans/&quot;&gt;GitHub Changelog&lt;/a&gt;
As of June 1, 2026, all Copilot plans bill by GitHub AI Credits consumed (each credit equals $0.01 of value) instead of premium request units. Included monthly allowances are 1,500 credits on Pro ($10), 7,000 on Pro+ ($39), and 20,000 on Max ($100); overages require an additional spending budget. Copilot code review now also consumes GitHub Actions minutes alongside AI Credits. &lt;strong&gt;Builder angle:&lt;/strong&gt; Copilot cost is now token-metered like API inference—agentic and review-heavy workflows need credit budgets and plan-tier math before defaulting to premium models.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DeepSeek makes V4 Pro 75% API price cut permanent at $0.87 per million output tokens&lt;/strong&gt; — &lt;a href=&quot;https://thenextweb.com/news/deepseek-v4-pro-75-percent-price-cut-permanent&quot;&gt;The Next Web&lt;/a&gt;
DeepSeek locked in a promotional 75% discount on V4 Pro API pricing after a May 31 expiry date, setting permanent rates from $0.003625 to $0.87 per million tokens (down from $0.0145 to $3.48). The model supports a 1M-token context window at the lower price, undercutting GPT-5, Claude Opus 4.7, and Gemini Flash tiers on per-token output cost. &lt;strong&gt;Builder angle:&lt;/strong&gt; Long-context and high-volume workloads have a materially cheaper frontier-tier option—builders should model routing simple tasks to DeepSeek while weighing compliance and latency tradeoffs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cloudflare AI Gateway adds dollar-based spend limits&lt;/strong&gt; — &lt;a href=&quot;https://developers.cloudflare.com/changelog/post/2026-06-05-spend-limits/&quot;&gt;source&lt;/a&gt; — Cost budgets scoped by model, provider, or metadata can block requests once cumulative token spend is exceeded—useful for capping runaway agent loops without per-request rate limits.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GitHub Copilot adds 1M-token context and configurable reasoning levels&lt;/strong&gt; — &lt;a href=&quot;https://github.blog/changelog/2026-06-04-larger-context-windows-and-configurable-reasoning-levels-for-github-copilot/&quot;&gt;source&lt;/a&gt; — Larger context and higher reasoning explicitly consume more AI Credits per interaction under the new billing model.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>inference-economics</category><category>prefix-caching</category><category>routing</category><category>vllm</category><category>cost-optimization</category><category>pricing</category><category>github-copilot</category></item><item><title>MCP Tooling — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/mcp-tooling/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/mcp-tooling/2026-06-06/</guid><description>MCP 2026-07-28 release candidate makes Streamable HTTP stateless; AWS documents OAuth code flow for AgentCore Gateway MCP inbound auth; mcp-auth-gateway…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;MCP is the tool surface layer — table stakes, not a moat. What matters is which tools get wired, how auth and policy gate them, and whether your org&apos;s skills compound. Ubiquitous protocol; scarce execution context.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;MCP 2026-07-28 release candidate makes Streamable HTTP stateless&lt;/strong&gt; — &lt;a href=&quot;https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/&quot;&gt;Model Context Protocol Blog&lt;/a&gt;
The MCP team published the 2026-07-28 release candidate, the largest spec revision since launch. It removes the initialize handshake and Mcp-Session-Id (SEP-2575, SEP-2567), requires Mcp-Method and Mcp-Name headers for routing (SEP-2243), adds tools/list caching via ttlMs and cacheScope (SEP-2549), graduates MCP Apps and Tasks as official extensions, and hardens OAuth with iss validation (SEP-2468). Final spec ships July 28, 2026. &lt;strong&gt;Builder angle:&lt;/strong&gt; Agents behind gateways can drop sticky sessions and route MCP calls on HTTP headers instead of parsing JSON-RPC bodies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;AWS documents OAuth code flow for AgentCore Gateway MCP inbound auth&lt;/strong&gt; — &lt;a href=&quot;https://aws.amazon.com/blogs/machine-learning/building-a-secure-auth-code-flow-setup-using-agentcore-gateway-with-mcp-clients/&quot;&gt;AWS Machine Learning Blog&lt;/a&gt;
AWS describes wiring Kiro IDE to Amazon Bedrock AgentCore Gateway with JWT inbound auth. Unauthenticated POSTs to /mcp return HTTP 401 with www-authenticate pointing to /.well-known/oauth-protected-resource; clients discover the IdP, run PKCE authorization code flow, then send Bearer tokens the Gateway validates (iss, exp, audience/custom claims) before proxying to MCP servers. Optional mcp-remote bridges stdio clients to the OAuth-protected HTTP endpoint. &lt;strong&gt;Builder angle:&lt;/strong&gt; Production MCP gateways can enforce per-user IdP tokens at the routing layer before any tool invocation reaches backend servers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;mcp-auth-gateway wraps stdio MCP servers with OAuth 2.1 and HTTP transport&lt;/strong&gt; — &lt;a href=&quot;https://github.com/nicknikolakakis/mcp-auth-gateway&quot;&gt;GitHub&lt;/a&gt;
Open-source Go gateway exposes any stdio-only MCP server over Streamable HTTP/SSE with OAuth 2.1/OIDC (PKCE, dynamic client registration, token refresh). YAML config selects the OIDC provider and upstream MCP command; each authenticated user gets an isolated MCP process with credentials injected via Unix domain socket rather than environment variables. &lt;strong&gt;Builder angle:&lt;/strong&gt; Teams can add MCP-spec OAuth and remote HTTP access to existing stdio servers without rewriting server code.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>mcp-tooling</category><category>mcp-spec</category><category>transport</category><category>stateless</category><category>oauth</category><category>mcp-gateway</category><category>jwt</category></item><item><title>Multi-Agent Interop — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/multi-agent-interop/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/multi-agent-interop/2026-06-06/</guid><description>Microsoft Foundry ships incoming A2A public preview with versioned agent cards and Entra-gated discovery; Hosted agents in Foundry add A2A as fourth pro…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;A2A and registries wire agents together — protocol, not moat. Value sits in delegation topology humans design and the institutional rules that govern who can act on whose behalf.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft Foundry ships incoming A2A public preview with versioned agent cards and Entra-gated discovery&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/foundry/from-building-agents-to-working-with-them-enterprise-agent-distribution-in-microsoft-foundry/&quot;&gt;Microsoft Foundry Blog&lt;/a&gt;
At Build 2026, Foundry Agent Service adds incoming Agent2Agent (public preview) alongside existing outbound A2A tools. Any prompt or responses-protocol hosted agent can expose an A2A base path, publish v1.0 and v0.3 agent cards at agentCard/{version}, and accept JSON-RPC delegation from external agents. Agent card fetch and invocation require Microsoft Entra ID (OBO or service identity); anonymous discovery is not supported. Foundry documents A2APreviewTool connections with custom AgentCardPath for cross-agent calls within the same project. &lt;strong&gt;Builder angle:&lt;/strong&gt; Bidirectional A2A on Foundry lets you publish a governed agent card once and accept delegated tasks from LangGraph, ADK, or other A2A clients without rebuilding the agent per partner stack.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hosted agents in Foundry add A2A as fourth protocol for cross-agent delegation on production runtimes&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/foundry/hosted-agents-build26/&quot;&gt;Microsoft Foundry Blog&lt;/a&gt;
Build 2026 updates to hosted agents extend the protocol surface beyond Responses, Invocations (HTTP), and Invocations (WebSocket) with Agent2Agent (A2A) for agent delegation. A single hosted agent can expose multiple protocols simultaneously, including Teams/M365 channel bridging and A2A handoffs. The post also ships direct source-code deployment (python_3_13/3_14, dotnet_10) without container registry, built-in Content Safety guardrails, and Voice Live/WebSocket for real-time voice agents. &lt;strong&gt;Builder angle:&lt;/strong&gt; Adding A2A beside Responses/Invocations means production hosted agents can both answer users and accept delegated sub-tasks from remote specialist agents on the same deployment.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A2A v1.0 reaches production with Signed Agent Cards and 150+ enterprise adopters&lt;/strong&gt; — &lt;a href=&quot;https://www.linuxfoundation.org/press/a2a-protocol-surpasses-150-organizations-lands-in-major-cloud-platforms-and-sees-enterprise-production-use-in-first-year&quot;&gt;source&lt;/a&gt; — Linux Foundation A2A v1.0 (April 9) adds Signed Agent Cards, multi-tenancy, and SDKs in five languages; embedded in Azure Foundry, Copilot Studio, and Bedrock AgentCore Runtime.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Amazon Bedrock AgentCore Runtime adds native A2A protocol with Agent Card discovery&lt;/strong&gt; — &lt;a href=&quot;https://aws.amazon.com/blogs/machine-learning/introducing-agent-to-agent-protocol-support-in-amazon-bedrock-agentcore-runtime/&quot;&gt;source&lt;/a&gt; — AgentCore hosts stateless A2A servers on port 9000 with SigV4/OAuth inbound auth, session isolation headers, and /.well-known/agent-card.json discovery for cross-framework delegation.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>multi-agent-interop</category><category>a2a</category><category>agent-card</category><category>discovery</category><category>microsoft</category><category>delegation</category><category>hosted-agents</category></item><item><title>Observability &amp; Evals — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/observability-evals/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/observability-evals/2026-06-06/</guid><description>Microsoft Foundry extends tracing and evals to any agent framework at Build 2026; Amazon Bedrock AgentCore ships Lambda code-based evaluators for CI gat…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;You cannot run what you cannot see. As agent loops get cheaper and longer, traces, evals, and cost attribution become electricity-metering for software — necessary, not optional, and increasingly a human judgment layer over raw metrics.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft Foundry extends tracing and evals to any agent framework at Build 2026&lt;/strong&gt; — &lt;a href=&quot;https://devblogs.microsoft.com/foundry/build-2026-from-observability-to-roi-for-ai-agents-on-any-framework/&quot;&gt;Microsoft Foundry Blog&lt;/a&gt;
Foundry observability (tracing and evals GA for hosted agents) now reaches LangChain, LangGraph, OpenAI SDK, Microsoft Agent Framework, and custom stacks via OpenTelemetry. Build 2026 adds multi-turn evaluation, context-specific rubric evaluators, intelligent trace sampling for production, trace replay and visualization, traces-to-dataset for offline regression, AZD inline dev observability, user simulation for edge-case pressure tests, agent optimizer (private preview), and ROI dashboards tying task completion and cost efficiency to trace-level drill-down. &lt;strong&gt;Builder angle:&lt;/strong&gt; Point your existing OTel exporter at Foundry to get multi-turn evals, rubric scoring, and production trace sampling without swapping orchestration frameworks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Amazon Bedrock AgentCore ships Lambda code-based evaluators for CI gates and online monitoring&lt;/strong&gt; — &lt;a href=&quot;https://aws.amazon.com/blogs/machine-learning/build-custom-code-based-evaluators-in-amazon-bedrock-agentcore/&quot;&gt;AWS Machine Learning Blog&lt;/a&gt;
AgentCore Evaluations now accepts custom Lambda evaluators registered at TRACE, TOOL_CALL, or SESSION levels. Evaluators receive OTel span payloads and return PASS/FAIL labels plus optional scores to CloudWatch Logs and Bedrock-AgentCore/Evaluations metrics. The same evaluator ID runs on-demand for dev iteration, CI/CD deployment gates, and online evaluation with 0.01–100% session sampling. Sample covers schema validation, numerical drift checks, workflow-order enforcement, and Comprehend PII detection alongside LLM-as-a-Judge evaluators. &lt;strong&gt;Builder angle:&lt;/strong&gt; Encode deterministic agent contracts—tool schemas, workflow order, PII rules—as Lambda evaluators that block deploys in CI and alarm in production on the same evaluator ID.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Langfuse adds GitHub Actions experiment gates and deterministic code evaluators&lt;/strong&gt; — &lt;a href=&quot;https://langfuse.com/changelog/2026-05-25-experiment-ci-cd-gates&quot;&gt;Langfuse Changelog&lt;/a&gt;
langfuse/experiment-action@v1.0.0 runs versioned dataset experiments in GitHub Actions, posts pass/regress/fail status on pull requests, and fails workflows when scores miss thresholds. Released May 28 alongside code evaluators: Python or TypeScript evaluate functions in the Langfuse UI score live observations or experiment runs for JSON parseability, schema validation, exact match, and required tool arguments without network egress, returning native scores for dashboards and Score Analytics. &lt;strong&gt;Builder angle:&lt;/strong&gt; Wire experiment-action to a regression dataset and code evaluators so agent PRs fail on deterministic contract breaks before semantic judge drift shows up in production.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Boomi May 2026 release streams Claude Code agent OTel traces into Agent Control Tower&lt;/strong&gt; — &lt;a href=&quot;https://boomi.com/blog/everything-you-want-to-know-about-the-may-2026-boomi-integration-and-automation-platform-release/&quot;&gt;source&lt;/a&gt; — Standardizes external Claude Code agents on OpenTelemetry so execution traces, performance metrics, and cost land alongside Agentstudio agents in one control plane.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Baidu unveils Agent Monitor observability suite at Create 2026&lt;/strong&gt; — &lt;a href=&quot;https://onai2.com/blog/baidu-agents-at-scale-create-2026/&quot;&gt;source&lt;/a&gt; — China&apos;s deployment-first agent stack now includes a named monitor for agent behavior and cost tracking alongside AgentBuilder 3.0 and Ernie Agent Runtime.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>observability-evals</category><category>tracing</category><category>evaluation</category><category>opentelemetry</category><category>multi-turn</category><category>microsoft</category><category>agentcore</category></item><item><title>Open Source — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/open-source/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/open-source/2026-06-06/</guid><description>NVIDIA open-sources physical AI agent skills across Omniverse, Cosmos, Alpamayo, and Metropolis; vLLM Semantic Router adds Session-Aware Agentic Routing…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;OSS shifts default stack choices faster than any vendor roadmap. When everyone can fork and run, the moat is maintenance, integration, and the humans who decide what to adopt before it is safe.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA open-sources physical AI agent skills across Omniverse, Cosmos, Alpamayo, and Metropolis&lt;/strong&gt; — &lt;a href=&quot;https://nvidianews.nvidia.com/news/nvidia-releases-major-collection-of-open-source-agent-tools-and-skills-for-physical-ai&quot;&gt;NVIDIA Newsroom&lt;/a&gt;
NVIDIA released a major collection of open-source physical AI agent tools and skills on May 31, 2026, distributed through GitHub and skills.sh for use with any coding agent. Skills span Omniverse, Cosmos, Alpamayo, and Metropolis for robotics, AV, vision AI, and industrial digital twins, including synthetic-data workflows (Neural Reconstruction, Video Augmentation, Defect Image Generation) runnable as Physical AI Launchables on NVIDIA Brev. &lt;strong&gt;Builder angle:&lt;/strong&gt; Agent builders working on embodied or simulation-heavy workflows can pull verified NVIDIA skills into existing harnesses instead of wiring CUDA-X libraries by hand.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;vLLM Semantic Router adds Session-Aware Agentic Routing with prefix-cache switch pricing&lt;/strong&gt; — &lt;a href=&quot;https://vllm.ai/blog/2026-06-02-session-aware-agentic-routing&quot;&gt;vLLM Blog&lt;/a&gt;
vLLM published Session-Aware Agentic Routing (SAAR) on June 2, 2026: a stateful layer on Semantic Router that keeps per-session memory via x-session-id, hard-locks model switches during active tool loops or non-portable provider state, and prices handoffs using prefix-cache checkout cost so long warm sessions are not discarded lightly. Operators get replayable routing traces and YAML-tunable idle/drift reset boundaries. &lt;strong&gt;Builder angle:&lt;/strong&gt; Self-hosted agent stacks using vLLM auto routing can keep multi-turn tool sessions stable without silently breaking provider continuation state or wasting prefix-cache locality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Ollama 0.30 ships with broader GGUF support and expanded MLX coverage&lt;/strong&gt; — &lt;a href=&quot;https://ollama.com/blog&quot;&gt;Ollama Blog&lt;/a&gt;
Ollama 0.30 released June 5, 2026 with improved performance and GGUF model compatibility through llama.cpp, extending MLX engine coverage on Apple silicon to more models and hardware. The same week Ollama added NVIDIA Nemotron 3 Ultra for high-throughput reasoning and long-running agent workflows. &lt;strong&gt;Builder angle:&lt;/strong&gt; Local-first builders gain one runtime path for more open-weight GGUF and MLX models without maintaining separate llama.cpp and MLX serving stacks.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;MiniMax M3 launches as open-weight coding model with 1M context and native image/video input&lt;/strong&gt; — &lt;a href=&quot;https://www.minimax.io/blog/minimax-m3&quot;&gt;source&lt;/a&gt; — First open-weight model combining frontier coding, 1M-token context, and native multimodal input; weights and technical report promised on Hugging Face and GitHub within days of the June 1 launch.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Future AGI ships Apache 2.0 self-hostable agent eval platform on GitHub&lt;/strong&gt; — &lt;a href=&quot;https://github.com/future-agi/future-agi&quot;&gt;source&lt;/a&gt; — End-to-end OSS stack for tracing, 50+ eval metrics, multi-turn simulations, guardrails, and an OpenAI-compatible gateway across 100+ providers — forkable for production agent QA loops.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>open-source</category><category>physical-ai</category><category>agent-skills</category><category>nvidia</category><category>open-source</category><category>vllm</category><category>inference-routing</category></item><item><title>Power &amp; Energy — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/power-energy/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/power-energy/2026-06-06/</guid><description>Google pairs Texas Meitner AI campus with 1 GW of co-located generation; Infineon joins NVIDIA MGX ecosystem for 800 VDC grid-to-rack power; Skeleton la…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;The grid is the new GPU shortage. Interconnect queues, PPAs, and on-site generation decide whether software abundance meets physical reality — expand-the-pie thinking applies to electrons, not just tokens.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Google pairs Texas Meitner AI campus with 1 GW of co-located generation&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterknowledge.com/energy-power-supply/google-bets-on-a-power-first-model-for-ai-data-centers&quot;&gt;Data Center Knowledge&lt;/a&gt;
Google and Intersect are building the Meitner Texas AI campus alongside more than 1 GW of wind, solar, battery storage, and on-site gas generation under a &quot;power-first&quot; model that secures dedicated supply before scaling compute. A prior Haskell County project pairs a Google data center with Intersect&apos;s 640 MW Quantum solar plus 1.3 GWh storage, scheduled to begin operations in June 2026. ERCOT approved its Batch Zero framework this month to streamline transmission capacity allocation for large-load projects with co-located generation. &lt;strong&gt;Builder angle:&lt;/strong&gt; Hyperscalers are co-locating gigawatt-scale generation with AI campuses to bypass multi-year grid interconnection queues that gate new rack deployments.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Infineon joins NVIDIA MGX ecosystem for 800 VDC grid-to-rack power&lt;/strong&gt; — &lt;a href=&quot;https://www.powerelectronicsnews.com/infineon-joins-nvidia-mgx-ecosystem-for-800-vdc-ai-power-architectures/&quot;&gt;Power Electronics News&lt;/a&gt;
Infineon joined NVIDIA&apos;s MGX modular AI server reference architecture to supply grid-to-core power conversion for 800 VDC distribution. The portfolio spans 800 VDC down to 50 V and 12 V intermediate buses and sub-10 V core rails, using silicon, SiC JFETs, and GaN devices for protection, hot-swap, and power management on native 800 V server boards. The design targets higher rack power density and reduced conversion-stage losses versus legacy 48–54 V distribution. &lt;strong&gt;Builder angle:&lt;/strong&gt; 800 VDC MGX reference designs with SiC/GaN protection ICs set the rack-level power chain builders must design around for next-gen GPU density.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Skeleton launches GrapheneUPS for AI data center grid compliance&lt;/strong&gt; — &lt;a href=&quot;https://www.skeletontech.com/news/skeleton-launches-grapheneups-a-high-density-ups-designed-for-ai-infrastructure&quot;&gt;Skeleton Technologies&lt;/a&gt;
Skeleton Technologies announced GrapheneUPS on June 4, 2026, a double-conversion UPS for AI data centers that stabilizes grid power during voltage dips, outages, and restoration. The company claims the system enables a 40% increase in computing power and up to a 44% smaller grid connection by smoothing rapid AI load fluctuations without additional stabilization equipment. Deployable in white space, gray space, or as a containerized unit, it operates as a load-proximate no-break layer aligned with evolving high-voltage DC roadmaps. &lt;strong&gt;Builder angle:&lt;/strong&gt; Load-proximate UPS that shrinks required grid connection size directly raises achievable MW/rack density where interconnection capacity is the binding constraint.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Wolfspeed launches 3.3 kV SiC modules and dedicated data center solutions team&lt;/strong&gt; — 3.3 kV SiC modules and a dedicated data-center team — supply signal for rack-level power conversion&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>power-energy</category><category>grid-interconnection</category><category>on-site-generation</category><category>hyperscale</category><category>texas</category><category>800v-dc</category><category>sic-gan</category></item><item><title>RAG &amp; Knowledge — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/rag-knowledge/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/rag-knowledge/2026-06-06/</guid><description>Microsoft Foundry IQ knowledge bases lift evidence recall up to 54% with agentic retrieval tiers; Cohesity Gaia patents embedding-based RAG over backup …</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Grounding is not a model feature — it is institutional context encoded in retrieval pipelines. When models commoditize, proprietary knowledge and how humans curate it become the durable edge.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Microsoft Foundry IQ knowledge bases lift evidence recall up to 54% with agentic retrieval tiers&lt;/strong&gt; — &lt;a href=&quot;https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/foundry-iq-improve-recall-by-up-to-54-with-knowledge-bases/4524852&quot;&gt;Microsoft Foundry Blog&lt;/a&gt;
Foundry IQ replaces static single-shot RAG with a dynamic agentic retrieval loop that batches and customizes subqueries per knowledge source, retrained semantic ranker, and retrievalReasoningEffort tiers (minimal, low, medium). On BrowseComp-Plus, knowledge bases beat standalone hybrid search by up to 46% evidence recall; pairing a smaller orchestrator model with agentic retrieval reaches 54% while cutting tool calls and token cost ~34%. Medium tier adds up to two iterative retrieval turns; heterogeneous sources (MCP, Fabric ontology, SQL) combine structured and unstructured recall. &lt;strong&gt;Builder angle:&lt;/strong&gt; retrievalReasoningEffort gives one knob to trade latency and token cost against recall instead of hand-building multi-query RAG loops.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cohesity Gaia patents embedding-based RAG over backup data without copying secondary stores&lt;/strong&gt; — &lt;a href=&quot;https://www.cohesity.com/newsroom/press/cohesity-secures-patent-gen-ai-retrieval-augmented-generation-secondary-data/&quot;&gt;Cohesity Newsroom&lt;/a&gt;
USPTO granted Patent 12,619,501 (May 5, 2026) for &quot;Data Retrieval Using Embeddings for Data in Backup Systems,&quot; covering Gaia&apos;s method of indexing embeddings on secondary/backup data in place. Gaia is available on Cohesity Data Cloud and lets GenAI search protected enterprise archives while preserving existing security, governance, and access controls—no separate data copy for AI indexing. &lt;strong&gt;Builder angle:&lt;/strong&gt; Indexes cold backup tiers in situ for RAG, a pattern for teams blocked from exporting archives into a standalone vector DB.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Elastic Agent Builder GA ships five-line RAG grounding via GitHub Copilot SDK bridge&lt;/strong&gt; — &lt;a href=&quot;https://www.elastic.co/search-labs/blog/rag-agent-elasticsearch-github-copilot-sdk&quot;&gt;Elasticsearch Labs&lt;/a&gt;
Elastic Agent Builder is GA and connects to the GitHub Copilot SDK through Elastic.Extensions.AI, registering Elasticsearch hybrid retrieval as a native Copilot tool in roughly five lines of C#. Copilot handles planning and orchestration; Elasticsearch returns logs, docs, and proprietary records. Supports RAG/hybrid search grounding, MCP/A2A interoperability with prebuilt Elastic agents, and optional Elastic Inference Service models. &lt;strong&gt;Builder angle:&lt;/strong&gt; Minimal bridge code wires production hybrid search into an orchestrator instead of building a custom retrieval tool layer.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Snowflake Cortex Sense assembles runtime context for agents from Horizon metadata at query time&lt;/strong&gt; — &lt;a href=&quot;https://atlan.com/know/snowflake/snowflake-cortex-sense/&quot;&gt;source&lt;/a&gt; — Summit 2026 runtime layer pulls query history, object metadata, and semantic views into agent prompts—relevant if your RAG stack lives inside Snowflake.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Arango Contextual Data Platform 4.0 unifies GraphRAG and VectorRAG on one multimodel store&lt;/strong&gt; — &lt;a href=&quot;https://www.businesswire.com/news/home/20260602127934/en/Arango-Showcases-Live-Contextual-Data-Layer-for-Enterprise-AI-at-Snowflake-Summit-2026&quot;&gt;source&lt;/a&gt; — Single ACID platform combines graph, vector, document, and full-text search with AutoGraph entity discovery—avoids stitching separate graph and vector pipelines.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>rag-knowledge</category><category>agentic-retrieval</category><category>semantic-rerank</category><category>hybrid-search</category><category>microsoft</category><category>embeddings</category><category>knowledge-ingestion</category></item><item><title>Semiconductors — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/semiconductors/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/semiconductors/2026-06-06/</guid><description>SK Hynix to double wafer capacity as 2026 HBM output is sold out; Intel unveils Crescent Island inference GPU with 480 GB LPDDR5X, skipping HBM; TSMC CE…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;Silicon supply sets the floor on inference economics. HBM allocation and lead times are not separate from the software story — they explain why capacity and cost move when models are already commoditized.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SK Hynix to double wafer capacity as 2026 HBM output is sold out&lt;/strong&gt; — &lt;a href=&quot;https://www.techzine.eu/news/devices/141775/sk-hynix-to-double-wafer-capacity-amid-ai-memory-shortage/&quot;&gt;Techzine&lt;/a&gt;
SK Hynix chairman Chey Tae-won said at Computex in Taipei that the company will double memory wafer production capacity over five years, citing AI-driven shortages that could persist through 2030. The company has already sold out its entire 2026 memory production and committed approximately $15 billion to HBM and advanced DRAM expansion in 2026, with Nvidia among its top customers. &lt;strong&gt;Builder angle:&lt;/strong&gt; HBM supply remains sold out through 2026, so five-year wafer doubling will not relieve near-term allocation constraints on AI accelerator builds.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Intel unveils Crescent Island inference GPU with 480 GB LPDDR5X, skipping HBM&lt;/strong&gt; — &lt;a href=&quot;https://newsroom.intel.com/data-center/intel-puts-agentic-ai-xeon-6-networking-ai-systems&quot;&gt;Intel Newsroom&lt;/a&gt;
At Computex on June 1, 2026, Intel disclosed Crescent Island, a Xe3P data center inference GPU using up to 480 GB of LPDDR5X instead of HBM, in a 350 W air-cooled PCIe form factor. The design targets token-intensive agentic AI workloads and avoids the HBM supply bottleneck constraining Nvidia and AMD accelerators. &lt;strong&gt;Builder angle:&lt;/strong&gt; An HBM-free 350 W PCIe inference card offers a scale-out procurement path where HBM allocation is the binding constraint on GPU availability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;TSMC CEO warns AI chip shortages will persist for years&lt;/strong&gt; — &lt;a href=&quot;https://www.techzine.eu/news/infrastructure/141838/tsmc-expects-ai-chip-shortage-to-persist-for-years/&quot;&gt;Techzine&lt;/a&gt;
At TSMC&apos;s June 4, 2026 annual shareholder meeting, CEO C.C. Wei said demand for advanced AI chips will outpace production capacity for years, even with six U.S. fabs under construction and capex at the upper end of a $56 billion range. TSMC raised its 2026 revenue growth forecast above 30% while prioritizing measured price increases over memory-style spikes. &lt;strong&gt;Builder angle:&lt;/strong&gt; TSMC&apos;s multi-year shortage outlook signals extended lead times on 3nm/2nm accelerator wafers, pushing builders toward long-term foundry allocation contracts.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>semiconductors</category><category>hbm</category><category>sk-hynix</category><category>memory-capacity</category><category>lead-time</category><category>inference-gpu</category><category>intel</category></item><item><title>Silicon &amp; Systems — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/silicon-systems/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/silicon-systems/2026-06-06/</guid><description>SK Hynix to double wafer capacity as 2026 HBM output is sold out; Intel unveils Crescent Island inference GPU with 480 GB LPDDR5X, skipping HBM; TSMC CE…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;AI is electricity — literally and figuratively. Silicon supply sets the floor on inference economics; power and data center capacity set the ceiling. Track the physical stack not as separate coverage but as the mechanism behind why inference economics and inference access move.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SK Hynix to double wafer capacity as 2026 HBM output is sold out&lt;/strong&gt; — &lt;a href=&quot;https://www.techzine.eu/news/devices/141775/sk-hynix-to-double-wafer-capacity-amid-ai-memory-shortage/&quot;&gt;Techzine&lt;/a&gt;
SK Hynix chairman Chey Tae-won said at Computex in Taipei that the company will double memory wafer production capacity over five years, citing AI-driven shortages that could persist through 2030. The company has already sold out its entire 2026 memory production and committed approximately $15 billion to HBM and advanced DRAM expansion in 2026, with Nvidia among its top customers. &lt;strong&gt;Builder angle:&lt;/strong&gt; HBM supply remains sold out through 2026, so five-year wafer doubling will not relieve near-term allocation constraints on AI accelerator builds.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Intel unveils Crescent Island inference GPU with 480 GB LPDDR5X, skipping HBM&lt;/strong&gt; — &lt;a href=&quot;https://newsroom.intel.com/data-center/intel-puts-agentic-ai-xeon-6-networking-ai-systems&quot;&gt;Intel Newsroom&lt;/a&gt;
At Computex on June 1, 2026, Intel disclosed Crescent Island, a Xe3P data center inference GPU using up to 480 GB of LPDDR5X instead of HBM, in a 350 W air-cooled PCIe form factor. The design targets token-intensive agentic AI workloads and avoids the HBM supply bottleneck constraining Nvidia and AMD accelerators. &lt;strong&gt;Builder angle:&lt;/strong&gt; An HBM-free 350 W PCIe inference card offers a scale-out procurement path where HBM allocation is the binding constraint on GPU availability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;TSMC CEO warns AI chip shortages will persist for years&lt;/strong&gt; — &lt;a href=&quot;https://www.techzine.eu/news/infrastructure/141838/tsmc-expects-ai-chip-shortage-to-persist-for-years/&quot;&gt;Techzine&lt;/a&gt;
At TSMC&apos;s June 4, 2026 annual shareholder meeting, CEO C.C. Wei said demand for advanced AI chips will outpace production capacity for years, even with six U.S. fabs under construction and capex at the upper end of a $56 billion range. TSMC raised its 2026 revenue growth forecast above 30% while prioritizing measured price increases over memory-style spikes. &lt;strong&gt;Builder angle:&lt;/strong&gt; TSMC&apos;s multi-year shortage outlook signals extended lead times on 3nm/2nm accelerator wafers, pushing builders toward long-term foundry allocation contracts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Google pairs Texas Meitner AI campus with 1 GW of co-located generation&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterknowledge.com/energy-power-supply/google-bets-on-a-power-first-model-for-ai-data-centers&quot;&gt;Data Center Knowledge&lt;/a&gt;
Google and Intersect are building the Meitner Texas AI campus alongside more than 1 GW of wind, solar, battery storage, and on-site gas generation under a &quot;power-first&quot; model that secures dedicated supply before scaling compute. A prior Haskell County project pairs a Google data center with Intersect&apos;s 640 MW Quantum solar plus 1.3 GWh storage, scheduled to begin operations in June 2026. ERCOT approved its Batch Zero framework this month to streamline transmission capacity allocation for large-load projects with co-located generation. &lt;strong&gt;Builder angle:&lt;/strong&gt; Hyperscalers are co-locating gigawatt-scale generation with AI campuses to bypass multi-year grid interconnection queues that gate new rack deployments.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Infineon joins NVIDIA MGX ecosystem for 800 VDC grid-to-rack power&lt;/strong&gt; — &lt;a href=&quot;https://www.powerelectronicsnews.com/infineon-joins-nvidia-mgx-ecosystem-for-800-vdc-ai-power-architectures/&quot;&gt;Power Electronics News&lt;/a&gt;
Infineon joined NVIDIA&apos;s MGX modular AI server reference architecture to supply grid-to-core power conversion for 800 VDC distribution. The portfolio spans 800 VDC down to 50 V and 12 V intermediate buses and sub-10 V core rails, using silicon, SiC JFETs, and GaN devices for protection, hot-swap, and power management on native 800 V server boards. The design targets higher rack power density and reduced conversion-stage losses versus legacy 48–54 V distribution. &lt;strong&gt;Builder angle:&lt;/strong&gt; 800 VDC MGX reference designs with SiC/GaN protection ICs set the rack-level power chain builders must design around for next-gen GPU density.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Skeleton launches GrapheneUPS for AI data center grid compliance&lt;/strong&gt; — &lt;a href=&quot;https://www.skeletontech.com/news/skeleton-launches-grapheneups-a-high-density-ups-designed-for-ai-infrastructure&quot;&gt;source&lt;/a&gt; — Load-proximate UPS that shrinks required grid connection size directly raises achievable MW/rack density where interconnection capacity is the binding constraint.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NVIDIA and IREN partner on up to 5 GW DSX AI infrastructure pipeline&lt;/strong&gt; — &lt;a href=&quot;https://nvidianews.nvidia.com/news/nvidia-and-iren-announce-strategic-partnership-to-accelerate-deployment-of-up-to-5-gigawatts-of-ai-infrastructure&quot;&gt;source&lt;/a&gt; — Anchors a multi-gigawatt U.S. AI factory pipeline with Texas Sweetwater as the first DSX deployment target for rented GPU capacity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CloudBurst Data Centers breaks ground on 1.2 GW Texas campus&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/cloudburst-data-centers-breaks-ground-on-12-gw-texas-campus/&quot;&gt;source&lt;/a&gt; — Adds 1.2 GW of committed Texas shell capacity beyond Sweetwater — a second major Gulf Coast AI factory lane coming online.&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>silicon-systems</category><category>hbm</category><category>sk-hynix</category><category>memory-capacity</category><category>lead-time</category><category>inference-gpu</category><category>intel</category></item><item><title>Data Center Buildout — June 6, 2026</title><link>https://artificialcuriositylabs.ai/daily/data-center-buildout/2026-06-06/</link><guid isPermaLink="true">https://artificialcuriositylabs.ai/daily/data-center-buildout/2026-06-06/</guid><description>NVIDIA and IREN partner on up to 5 GW DSX AI infrastructure pipeline; CloudBurst Data Centers breaks ground on 1.2 GW Texas campus; SoftBank plans up to…</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;The read&lt;/h2&gt;
&lt;p&gt;AI is electricity — literally. Hyperscaler capacity, leases, and construction timelines are the physical layer of Jevons: cheaper intelligence drives more compute demand until power and shell become the binding constraint.&lt;/p&gt;
&lt;h2&gt;What moved&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;NVIDIA and IREN partner on up to 5 GW DSX AI infrastructure pipeline&lt;/strong&gt; — &lt;a href=&quot;https://nvidianews.nvidia.com/news/nvidia-and-iren-announce-strategic-partnership-to-accelerate-deployment-of-up-to-5-gigawatts-of-ai-infrastructure&quot;&gt;NVIDIA Newsroom&lt;/a&gt;
NVIDIA and IREN announced a strategic partnership on May 7, 2026 to deploy up to 5 GW of NVIDIA DSX-aligned AI infrastructure across IREN&apos;s global data center pipeline. Future deployments will focus on IREN&apos;s 2-GW Sweetwater campus in Texas as the flagship DSX site. NVIDIA received a five-year warrant to purchase up to 30 million IREN shares at $70 per share, representing up to $2.1 billion in potential investment. &lt;strong&gt;Builder angle:&lt;/strong&gt; Anchors a multi-gigawatt U.S. AI factory pipeline with Texas Sweetwater as the first DSX deployment target for rented GPU capacity.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;CloudBurst Data Centers breaks ground on 1.2 GW Texas campus&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/cloudburst-data-centers-breaks-ground-on-12-gw-texas-campus/&quot;&gt;Data Center Dynamics&lt;/a&gt;
CloudBurst Data Centers broke ground on a 1.2 GW AI data center campus in Texas, one of the largest single-site capacity announcements in the U.S. buildout wave. Groundbreaking signals construction start with multi-phase delivery for hyperscale and GPU-dense workloads. &lt;strong&gt;Builder angle:&lt;/strong&gt; Adds 1.2 GW of committed Texas shell capacity beyond Sweetwater — a second major Gulf Coast AI factory lane coming online.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SoftBank plans up to €75B for 5 GW of data center capacity in France&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterknowledge.com/data-center-construction/new-data-center-developments-june-2026&quot;&gt;Data Center Knowledge&lt;/a&gt;
SoftBank announced plans to invest up to €75 billion ($85 billion) to develop 5 GW of data center capacity across France, starting with three sites in Dunkirk, Bosquel, and Bouchain. The program targets 3.1 GW online by 2031 as part of a broader European AI infrastructure buildout reported in June 2026. &lt;strong&gt;Builder angle:&lt;/strong&gt; Commits multi-gigawatt European shell capacity with a 2031 delivery milestone, expanding where sovereign and enterprise AI workloads can land.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Kenya suspends $1B Microsoft data center over insufficient national power&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/kenya-suspends-microsoft-data-center-project-over-power-concerns/&quot;&gt;Data Center Dynamics&lt;/a&gt;
Kenyan authorities suspended a planned $1 billion Microsoft data center project citing insufficient national grid capacity to support the facility. The pause highlights power as a gating constraint for hyperscaler expansion in emerging markets. &lt;strong&gt;Builder angle:&lt;/strong&gt; Surfaces deployment blocker risk — even major hyperscaler projects halt when grid MW is not contractually secured.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;CDC Data Centres secures 555 MW contract in Australia (FY28–FY29)&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/cdc-data-centres-secures-555mw-contract-australia/&quot;&gt;Data Center Dynamics&lt;/a&gt;
CDC Data Centres secured a 555 MW data center contract in Australia with delivery targeted for FY28–FY29, adding substantial APAC capacity with a disclosed timeline. &lt;strong&gt;Builder angle:&lt;/strong&gt; Locks in half-gigawatt APAC shell with FY28–FY29 delivery — a dated option for regional AI workload placement.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Also tracking&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Gorilla Technology plans 200 MW AI campus in Korat, Thailand&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/gorilla-technology-eyes-200mw-data-center-project-in-thailand/&quot;&gt;source&lt;/a&gt; — 200 MW SEA campus with Q1 2027 phase-one target&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Equinix invests $190M in KL2 Cyberjaya, Malaysia (2,200+ cabinets)&lt;/strong&gt; — &lt;a href=&quot;https://www.datacenterdynamics.com/en/news/equinix-invests-in-kl2-cyberjaya-malaysia/&quot;&gt;source&lt;/a&gt; — Colo expansion; cabinet count without MW disclosure&lt;/li&gt;
&lt;/ul&gt;</content:encoded><category>data-center-buildout</category><category>capacity</category><category>ai-factory</category><category>texas</category><category>hyperscale-adjacent</category><category>construction-timeline</category><category>gpu-campus</category></item></channel></rss>