Claude Code Has Native OpenTelemetry. Almost Nobody Knows.

Most people running Claude Code have no idea what it’s actually doing at the API level. How many tokens did that session burn? What did it cost? How much of your context was served from cache versus freshly processed? Which sessions are expensive and which are cheap for the same type of work?

Claude Code has been quietly emitting all of this data via OpenTelemetry since at least v2.1.75. It’s not enabled by default, and it hasn’t been widely documented. Here’s how to turn it on, collect it locally with a Docker-based stack, and build a dashboard that actually shows you what’s happening.

The three metrics that matter

The Claude Code binary ships a full OpenTelemetry SDK. One environment variable starts the data flowing:

export CLAUDE_CODE_ENABLE_TELEMETRY=1
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=cumulative

Three metrics come out (confirmed from live traffic, v2.1.75, meter name com.anthropic.claude_code):

Metric	Labels	What it measures
`claude_code.session.count`	`session_id`, `user_id`, `terminal_type`	One increment per session started
`claude_code.token.usage`	`model`, `type` (input / output / cacheRead / cacheCreation)	Token counts by type, per API call
`claude_code.cost.usage`	`model`, `session_id`	Estimated session cost in USD

The type label on the token metric is where the insight lives. cacheRead and cacheCreation are tracked separately from input, which means you can see exactly how much of your context is being served from the prompt cache versus being processed fresh on every call. On a mature codebase, that number is often 80–90% — which means the same 200k-token context window costs a fraction of what it looks like on paper.

The Delta vs Cumulative gotcha

Before setting up the stack, one critical detail: Claude Code’s OTel SDK defaults to Delta temporality — each metric export contains only the change since the last export. Prometheus expects Cumulative counters that keep incrementing.

With Delta metrics, short sessions (like claude --print "...") complete and flush before Prometheus’s scrape window catches them. You get no data. Collector restarts also wipe accumulated state.

The fix is already in the env vars above:

export OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=cumulative

With this set, Claude Code emits running totals. Prometheus retains the last seen value after a session ends and builds a proper historical record across sessions. Don’t skip this variable — it’s the one that makes the whole thing work reliably.

The local stack

The simplest setup that gives you persistent history and a real dashboard:

Claude Code  →  OTLP gRPC  →  OTel Collector
                                     │
                              Prometheus scrape
                                     │
                               Prometheus  ──────────────┐
                                     │                   │
                                  Grafana ───────────────┘
                                     │
                         (optional: cloud-side datasource)

Three Docker containers: OTel Collector (receives and re-exports), Prometheus (stores time-series, 30-day retention), Grafana (dashboards).

The OTel Collector is the bridge. It receives OTLP from Claude Code on port 4317, exposes a Prometheus scrape endpoint on port 8889, and Prometheus scrapes it every 15 seconds. Claude Code’s default metric export interval is 60 seconds (OTEL_METRIC_EXPORT_INTERVAL, default 60000ms), so most scrapes will return the same values — but 15s gives tighter resolution if you lower the export interval later.

The docker-compose.yml:

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    container_name: cc-otel-collector
    command: ["--config=/etc/otel/config.yaml"]
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8889:8889"   # Prometheus scrape endpoint
    volumes:
      - ./otel-collector-config.yaml:/etc/otel/config.yaml:ro
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:v2.55.0
    container_name: cc-prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--storage.tsdb.retention.time=30d"
      - "--web.enable-lifecycle"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.4.0
    container_name: cc-grafana
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=claude
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_INSTALL_PLUGINS=grafana-clock-panel
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
      - ~/.aws/credentials:/usr/share/grafana/.aws/credentials:ro
    depends_on:
      - prometheus
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:

The prometheus.yml scrape config:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: claude-code
    honor_labels: true
    static_configs:
      - targets: ["otel-collector:8889"]

honor_labels: true tells Prometheus to keep the original label names from the collector rather than overwriting them with job/instance labels. The otel-collector target uses the Docker service name — within the Compose network, containers reach each other by service name.

The OTel Collector config is minimal — receive OTLP, export to Prometheus:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 10s

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889
    enable_open_metrics: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

A note on Docker on macOS

What is Colima? Colima is a lightweight, free alternative to Docker Desktop on macOS. It runs a Linux VM under the hood (using Lima) that hosts the Docker daemon — same docker and docker compose CLI, no GUI, no licensing restrictions. The catch: it places the Docker socket at ~/.colima/default/docker.sock instead of the standard /var/run/docker.sock, so you need to tell Docker clients where to find it. If you’re using Docker Desktop, skip this section entirely.

If you’re running Docker via Colima, the default Docker socket path doesn’t exist and compose commands will fail silently. Set DOCKER_HOST before running:

export DOCKER_HOST="unix://${HOME}/.colima/default/docker.sock"
docker compose up -d

You can wrap this in aliases so you never have to think about it:

alias otel-start='DOCKER_HOST="unix://${HOME}/.colima/default/docker.sock" docker compose -f /path/to/your/otel/docker-compose.yml up -d'
alias otel-stop='DOCKER_HOST="unix://${HOME}/.colima/default/docker.sock" docker compose -f /path/to/your/otel/docker-compose.yml down'

If you’re using Docker Desktop instead, omit the DOCKER_HOST export entirely.

Adding cloud-side metrics

If you’re running Claude Code through a cloud provider that exposes API-level metrics — token counts, invocation latency, cache metrics, throttles — you can add that provider as a second Grafana datasource and correlate it with your client-side OTel data.

The interesting comparison is between claude_code.token.usage{type="input"} (what Claude Code’s SDK estimated as input tokens) and whatever your provider reports as actual processed tokens. The gap between those two numbers is your cache savings — input tokens counted as cacheRead were served at a fraction of the standard price, so they show up on the client side but don’t register the same way server-side.

Watching both metrics on the same time axis makes the cache discount visible as an actual number rather than an estimate. Configure the cloud datasource with your provider’s credentials and point it at whatever namespace surfaces those metrics.

For example, if you’re running on AWS Bedrock:

# Example: AWS Bedrock (CloudWatch namespace: AWS/Bedrock)
# Metrics: Invocations, InvocationLatency, InputTokenCount, OutputTokenCount, Throttles
# Compare InputTokenCount (server) vs claude_code.token.usage{type="input"} (client)
# Gap = cacheRead tokens = context served from prompt cache at reduced pricing

If your credentials rotate, automate writing them to the Grafana datasource profile so containers pick them up without restarts — Grafana reads credentials at query time, not at startup.

PromQL queries worth bookmarking

Once data is flowing, these are the queries I have pinned:

# Cache hit rate (how much context came from cache vs fresh)
sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m])) /
  (sum(rate(claude_code_token_usage_tokens_total{type="input"}[5m])) +
   sum(rate(claude_code_token_usage_tokens_total{type="cacheRead"}[5m])))

# Estimated cost per hour
sum(rate(claude_code_cost_usage_USD_total[1h])) * 3600

# Cost broken down by model
sum by (model) (rate(claude_code_cost_usage_USD_total[5m])) * 300

# Output tokens per session (which sessions were doing real work)
sum by (session_id) (claude_code_token_usage_tokens_total{type="output"})

The cache hit rate is the most immediately useful. On a well-established codebase where Claude Code has been building up context across sessions, it routinely runs at 80–90%. That means only 10–20% of your input tokens are being billed at full price. Watching this metric tells you whether your project context files are doing their job — a cache hit rate that drops over time is usually a signal that context has drifted and needs to be refreshed.

What this makes visible

Claude Code operates at machine speed. Without instrumentation, the only feedback you get is the cost summary at the end of a session — and even that’s an estimate. With OTel flowing to a persistent store, you can start asking questions that aren’t visible otherwise:

Which sessions are expensive versus cheap for the same type of work?
Does running a more capable model for sub-tasks actually change output quality, relative to the cost delta?
Is my cache hit rate dropping over time as context drifts?
What does a high-output-token session look like compared to a high-input-token session — is the model generating or mostly reading?

None of this requires any modifications to Claude Code. The instrumentation is built in. You’re flying blind if you haven’t turned it on.