# Builder's Daily — Agent Reliability > Rolling 14-day signal for beat `agent-reliability`. Ephemeral context — not evergreen corpus. > Author: Amit Kumar Agrawal | https://artificialcuriositylabs.ai > Generated: 2026-06-10 > Human index: https://artificialcuriositylabs.ai/daily/agent-reliability/ > RSS: https://artificialcuriositylabs.ai/daily/agent-reliability/rss.xml --- # Agent Reliability — June 10, 2026 **URL:** https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-10/ **Beat:** agent-reliability **Date:** 2026-06-10 **Topics:** retrieval-architecture, agentic-search, reranking, cost, ingestion, parsing **Summary:** Perplexity launches 'Search as Code': agents write Python to compose retrieval, rerank, and dedup primitives directly; LlamaParse adds word/line/cell-le… ## The read You cannot run what you cannot see. Grounding is institutional context encoded in retrieval; observability is electricity-metering for agent loops; security moves from policy decks to runtime guardrails. Reliability is the stack between "it works in demo" and "it runs in prod." ## What moved - **Perplexity launches 'Search as Code': agents write Python to compose retrieval, rerank, and dedup primitives directly** — [Perplexity Research](https://research.perplexity.ai/articles/rethinking-search-as-code-generation) Perplexity replaced its sequential function-calling search loop with 'Search as Code' (SaC): models generate task-specific Python that runs in sandboxes and calls an Agentic Search SDK exposing atomic primitives (retrieval, ranking, filtering, deduplication). On a CVE-advisory task this cut token usage 85% (288.7K to 42.9K tokens), and SaC scored +29% on DSQA and +45% on a new WANDR benchmark, with medium-reasoning SaC beating all non-SaC systems at under $1/task. Rolling out now in Perplexity Computer and the Agent API. **Builder angle:** Builders get composable, code-level retrieval/rerank/dedup primitives instead of fixed search endpoints, enabling per-task retrieval strategies at a fraction of the token cost of loop-based agentic search. - **LlamaParse adds word/line/cell-level bounding boxes for audit-grade citation grounding** — [LlamaIndex Blog](https://www.llamaindex.ai/blog/announcing-granular-bounding-boxes-in-llamaparse) LlamaParse now supports an opt-in `output_options.granular_bboxes` parameter to return word-, line-, or cell-level coordinates instead of coarse layout-level boxes. The system applies coordinates only to text explicitly present on the page (not inferred values or AI summaries), enabling exact-location citations for dense documents like financial filings and tables. Available across paid tiers, with Agentic Plus adding extra verification passes. **Builder angle:** RAG pipelines can now ground citations to a specific word or table cell rather than highlighting a whole page or paragraph, closing a gap for compliance and financial-document agents that need audit-grade provenance. - **Arize: Microsoft's open trust stack makes OpenInference the shared trace contract linking ASSERT evals, ACS runtime controls, and Phoenix/Arize AX** — [Arize Blog](https://arize.com/blog/microsoft-open-trust-stack-openinference/) At Build 2026 Microsoft introduced ASSERT (MIT-licensed, spec-driven agent evaluation and regression-testing framework that turns behavior specs into test cases and graded traces) and Agent Control Specification (ACS), a portable runtime-guardrail standard with checkpoints at input, LLM call, state, tool execution, and output. Both standardize on OpenInference, the OpenTelemetry-for-AI standard Arize created (33+ framework integrations, two-line instrumentation): ASSERT reads OpenInference spans as judge evidence, ACS emits its control decisions as spans, and the same trace stream feeds Phoenix or Arize AX for production monitoring. **Builder angle:** One OpenInference instrumentation pass now feeds CI eval gates (ASSERT), runtime guardrails (ACS), and production observability (Phoenix/Arize AX) without separate re-instrumentation per tool. ## Also tracking - **Sedai launches autonomous AI Agent Optimization platform with real-time per-team/per-model cost attribution and AI-judge-based routing** — [source](https://cioinfluence.com/machine-learning/sedai-launches-the-first-autonomous-platform-for-ai-agent-optimization/) — Drop-in layer for per-team/per-model token-cost attribution and automated cost-aware model routing across providers without re-instrumenting agent code. - **Zscaler launches AI Broker, AI Access Graph, and Endpoint AI Security to govern agent identity and MCP/A2A traffic** — [source](https://www.zscaler.com/press/zscaler-unveils-new-product-innovations-secure-agentic-ai) — Gives a concrete pattern for scoping which MCP/A2A tools an agent can reach per identity and tracking data lineage in real time — a deployable access-control and audit layer for agent fleets. --- # Agent Reliability — June 8, 2026 **URL:** https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-08/ **Beat:** agent-reliability **Date:** 2026-06-08 **Topics:** embeddings, ingestion-pipeline, vector-sync, azure, open-source, context-database **Summary:** Microsoft open-sources OmniVec, an embedding pipeline platform that auto-syncs vectors with changing operational data on Azure; Volcengine open-sources … ## The read You cannot run what you cannot see. Grounding is institutional context encoded in retrieval; observability is electricity-metering for agent loops; security moves from policy decks to runtime guardrails. Reliability is the stack between "it works in demo" and "it runs in prod." ## What moved - **Microsoft open-sources OmniVec, an embedding pipeline platform that auto-syncs vectors with changing operational data on Azure** — [Microsoft Azure Cosmos DB Blog](https://devblogs.microsoft.com/cosmosdb/introducing-omnivec-an-open-source-embedding-platform-for-ai-apps-on-azure/) Microsoft open-sourced OmniVec (public preview), a platform that automates embedding pipelines end to end: register a source, embedding model, and destination vector store, and it handles initial backfill, change tracking, model invocation, and writeback. Change detection uses native mechanisms per source — Cosmos DB change feed, Blob Storage events, CDC for SQL Server/PostgreSQL — and it deploys into the user's own Azure subscription on AKS with a REST API, CLI, and web UI. Initial release supports Cosmos DB, PostgreSQL, SQL Server, and Blob Storage as both sources and destinations. **Builder angle:** Removes the custom-glue work of keeping vector indexes in sync with live operational data — CDC-based change tracking means re-embedding happens automatically as source rows change, addressing a core RAG freshness problem. - **Volcengine open-sources OpenViking, a filesystem-style 'context database' that replaces flat vector search with tiered, recursive retrieval** — [GitHub (volcengine/OpenViking)](https://github.com/volcengine/OpenViking) ByteDance's Volcengine shipped OpenViking v0.3.24 (June 5), an open-source context-management system for agents built on a virtual filesystem (viking:// URIs) instead of a traditional vector store. It auto-processes content into three tiers — L0 one-line abstracts, L1 ~2k-token overviews, L2 full detail on demand — and retrieves via a 'directory recursive' strategy combining intent analysis, initial vector positioning, and recursive drill-down through subdirectories rather than flat top-k similarity search. It plugs into multiple embedding/VLM providers (Volcengine Doubao, OpenAI, Gemini, local Ollama models) and includes session-based memory extraction that updates agent/user memory after each run. **Builder angle:** Offers a concrete, reproducible alternative retrieval pattern to flat vector top-k — tiered context loading plus recursive directory drill-down — packaged as an open-source, swappable-embedding-provider system for agent memory and knowledge bases. - **Snowflake launches Cortex Sense, a shared context layer that auto-assembles business knowledge for production agents** — [Snowflake Newsroom](https://www.snowflake.com/en/news/press-releases/snowflake-cowork-powers-the-agentic-enterprise-as-the-personal-agent-for-knowledge-workers-to-work-smarter/) At Summit 26, Snowflake announced Cortex Sense, a capability that 'automatically brings together the data, business definitions, and operational knowledge that AI agents need to be trustworthy and useful' into a shared context layer usable by both its CoWork agent and CoCo coding agent. It ships with prebuilt role plugins (e.g., finance, sales) that 'combine skills, business logic, and MCP connectors,' plus a Deep Research feature for multi-step, cited reasoning over structured and unstructured data. Snowflake says this cuts manual context setup and moves agents 'from concept to production in days instead of weeks.' **Builder angle:** Packages the unglamorous RAG groundwork — business-definition mapping, connector wiring, context assembly — into reusable role-based plugins, aimed at cutting the setup time that normally gates enterprise agent deployment. ## Also tracking - **Splunk ships AI Agent Monitoring in Observability Cloud built on OpenTelemetry and Cisco AGNTCY** — [source](https://www.splunk.com/en_us/blog/observability/monitor-llm-and-agent-performance-with-ai-agent-monitoring-in-splunk-observability-cloud.html) — Gives teams already on Splunk APM a drop-in path to per-trace cost, latency, and quality (hallucination/toxicity) scoring without adopting a separate LLM-observability vendor. - **Arize AX adds agent fleet observability with token-cost tracking and harness-as-a-judge evals** — [source](https://arize.com/blog/building-ai-factory-self-improving-agents-arize-ax/) — Targets teams running many agents at once — surfaces which agent in the fleet is burning the most tokens or drifting in behavior, and lets you spin up an eval run straight from a monitoring alert. --- # Agent Reliability — June 6, 2026 **URL:** https://artificialcuriositylabs.ai/daily/agent-reliability/2026-06-06/ **Beat:** agent-reliability **Date:** 2026-06-06 **Topics:** agentic-retrieval, semantic-rerank, hybrid-search, microsoft, embeddings, knowledge-ingestion **Summary:** Microsoft Foundry IQ knowledge bases lift evidence recall up to 54% with agentic retrieval tiers; Cohesity Gaia patents embedding-based RAG over backup … ## The read You cannot run what you cannot see. Grounding is institutional context encoded in retrieval; observability is electricity-metering for agent loops; security moves from policy decks to runtime guardrails. Reliability is the stack between "it works in demo" and "it runs in prod." ## What moved - **Microsoft Foundry IQ knowledge bases lift evidence recall up to 54% with agentic retrieval tiers** — [Microsoft Foundry Blog](https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/foundry-iq-improve-recall-by-up-to-54-with-knowledge-bases/4524852) Foundry IQ replaces static single-shot RAG with a dynamic agentic retrieval loop that batches and customizes subqueries per knowledge source, retrained semantic ranker, and retrievalReasoningEffort tiers (minimal, low, medium). On BrowseComp-Plus, knowledge bases beat standalone hybrid search by up to 46% evidence recall; pairing a smaller orchestrator model with agentic retrieval reaches 54% while cutting tool calls and token cost ~34%. Medium tier adds up to two iterative retrieval turns; heterogeneous sources (MCP, Fabric ontology, SQL) combine structured and unstructured recall. **Builder angle:** retrievalReasoningEffort gives one knob to trade latency and token cost against recall instead of hand-building multi-query RAG loops. - **Cohesity Gaia patents embedding-based RAG over backup data without copying secondary stores** — [Cohesity Newsroom](https://www.cohesity.com/newsroom/press/cohesity-secures-patent-gen-ai-retrieval-augmented-generation-secondary-data/) USPTO granted Patent 12,619,501 (May 5, 2026) for "Data Retrieval Using Embeddings for Data in Backup Systems," covering Gaia's method of indexing embeddings on secondary/backup data in place. Gaia is available on Cohesity Data Cloud and lets GenAI search protected enterprise archives while preserving existing security, governance, and access controls—no separate data copy for AI indexing. **Builder angle:** Indexes cold backup tiers in situ for RAG, a pattern for teams blocked from exporting archives into a standalone vector DB. - **Elastic Agent Builder GA ships five-line RAG grounding via GitHub Copilot SDK bridge** — [Elasticsearch Labs](https://www.elastic.co/search-labs/blog/rag-agent-elasticsearch-github-copilot-sdk) Elastic Agent Builder is GA and connects to the GitHub Copilot SDK through Elastic.Extensions.AI, registering Elasticsearch hybrid retrieval as a native Copilot tool in roughly five lines of C#. Copilot handles planning and orchestration; Elasticsearch returns logs, docs, and proprietary records. Supports RAG/hybrid search grounding, MCP/A2A interoperability with prebuilt Elastic agents, and optional Elastic Inference Service models. **Builder angle:** Minimal bridge code wires production hybrid search into an orchestrator instead of building a custom retrieval tool layer. ## Also tracking - **Microsoft Foundry extends tracing and evals to any agent framework at Build 2026** — [source](https://devblogs.microsoft.com/foundry/build-2026-from-observability-to-roi-for-ai-agents-on-any-framework/) — Point your existing OTel exporter at Foundry to get multi-turn evals, rubric scoring, and production trace sampling without swapping orchestration frameworks. - **Amazon Bedrock AgentCore ships Lambda code-based evaluators for CI gates and online monitoring** — [source](https://aws.amazon.com/blogs/machine-learning/build-custom-code-based-evaluators-in-amazon-bedrock-agentcore/) — Encode deterministic agent contracts—tool schemas, workflow order, PII rules—as Lambda evaluators that block deploys in CI and alarm in production on the same evaluator ID.