AI Platform
What does inference cost and what platform do I build on?
← All topics · Subscribe by email · RSS feed · llms slice (14d)
-
AI Platform — June 10, 2026
DeepSeek V4 pricing triggers China-wide AI API price war — Tencent Cloud cuts DeepSeek-V4 hosting 97.5%, Xiaomi cuts MiMo-V2.5 99%; Google's GKE Inferen…
pricing · china · deepseek · tencent-cloud · xiaomi · routing
-
AI Platform — June 9, 2026
Cerebras positions Kimi K2.6 at 981 tok/s output — 5.4× faster than Gemini 3.5 Flash with half the TTFT; Google Gemini 2.0 Flash permanently shut down J…
routing · latency · throughput · cerebras · kimi · benchmarks
-
AI Platform — June 8, 2026
DigitalOcean ships prefix-aware routing and incoming cached-token pricing, claims up to 4x lower effective compute cost; Anthropic moves Claude Agent SD…
prefix-caching · kv-cache · routing · pricing · billing · agent-sdk
-
AI Platform — June 7, 2026
vLLM Semantic Router v0.3 Themis ships SAAR stateful routing with RouterArena #1 ranking at $0.11/1K queries; DigitalOcean Inference Gateway ships prefi…
routing · vllm · agentic · saar · open-source · latency
-
AI Platform — June 6, 2026
DigitalOcean Inference Gateway ships prefix-aware routing with 75%+ cache hit rates; GitHub Copilot switches all plans to usage-based AI Credits billing…
prefix-caching · routing · vllm · cost-optimization · pricing · github-copilot
-
Builder Tooling — June 6, 2026
Vercel Sandbox Drives add persistent attachable storage for agent workspaces; skills.sh API launches with Vercel OIDC auth for querying 600k+ open-sourc…
vercel-sandbox · persistent-storage · agent-workspace · private-beta · vercel · skills-api
-
Inference Economics — June 6, 2026
DigitalOcean Inference Gateway ships prefix-aware routing with 75%+ cache hit rates; GitHub Copilot switches all plans to usage-based AI Credits billing…
prefix-caching · routing · vllm · cost-optimization · pricing · github-copilot