Open Source — June 6, 2026

What OSS should I fork, study, or adopt this quarter?

5 Jun, 2026

physical-ai
agent-skills
nvidia
open-source
vllm
inference-routing

The read

OSS shifts default stack choices faster than any vendor roadmap. When everyone can fork and run, the moat is maintenance, integration, and the humans who decide what to adopt before it is safe.

What moved

NVIDIA open-sources physical AI agent skills across Omniverse, Cosmos, Alpamayo, and Metropolis — NVIDIA Newsroom NVIDIA released a major collection of open-source physical AI agent tools and skills on May 31, 2026, distributed through GitHub and skills.sh for use with any coding agent. Skills span Omniverse, Cosmos, Alpamayo, and Metropolis for robotics, AV, vision AI, and industrial digital twins, including synthetic-data workflows (Neural Reconstruction, Video Augmentation, Defect Image Generation) runnable as Physical AI Launchables on NVIDIA Brev. Builder angle: Agent builders working on embodied or simulation-heavy workflows can pull verified NVIDIA skills into existing harnesses instead of wiring CUDA-X libraries by hand.
vLLM Semantic Router adds Session-Aware Agentic Routing with prefix-cache switch pricing — vLLM Blog vLLM published Session-Aware Agentic Routing (SAAR) on June 2, 2026: a stateful layer on Semantic Router that keeps per-session memory via x-session-id, hard-locks model switches during active tool loops or non-portable provider state, and prices handoffs using prefix-cache checkout cost so long warm sessions are not discarded lightly. Operators get replayable routing traces and YAML-tunable idle/drift reset boundaries. Builder angle: Self-hosted agent stacks using vLLM auto routing can keep multi-turn tool sessions stable without silently breaking provider continuation state or wasting prefix-cache locality.
Ollama 0.30 ships with broader GGUF support and expanded MLX coverage — Ollama Blog Ollama 0.30 released June 5, 2026 with improved performance and GGUF model compatibility through llama.cpp, extending MLX engine coverage on Apple silicon to more models and hardware. The same week Ollama added NVIDIA Nemotron 3 Ultra for high-throughput reasoning and long-running agent workflows. Builder angle: Local-first builders gain one runtime path for more open-weight GGUF and MLX models without maintaining separate llama.cpp and MLX serving stacks.

Also tracking

MiniMax M3 launches as open-weight coding model with 1M context and native image/video input — source — First open-weight model combining frontier coding, 1M-token context, and native multimodal input; weights and technical report promised on Hugging Face and GitHub within days of the June 1 launch.
Future AGI ships Apache 2.0 self-hostable agent eval platform on GitHub — source — End-to-end OSS stack for tracing, 50+ eval metrics, multi-turn simulations, guardrails, and an OpenAI-compatible gateway across 100+ providers — forkable for production agent QA loops.