[FEATURE] Add thinking tokens to OTEL token_usage metric
Preflight Checklist
- [x] I have searched existing requests and this feature hasn't been requested yet
- [x] This is a single feature request (not multiple features)
Problem Statement
The claude_code.token.usage metric currently tracks four token types:
-
input -
output -
cacheRead -
cacheCreation
Missing: thinking tokens.
Thinking tokens (extended thinking / internal reasoning) can be 3-10x the visible output tokens and count fully against usage limits. Without tracking them, OTEL metrics significantly undercount actual token consumption.
From Anthropic's own documentation:
"Both the invisible thinking tokens AND the visible response are billed as output tokens. Even worse, they count fully against your usage limits."
Impact
I built a correlation analysis between OTEL metrics and usage limit burn rate. All correlations were weak (|r| < 0.3) because OTEL is missing a major factor.
| What OTEL Tracks | What Counts Toward Limits |
|---|---|
| input | input |
| output (visible only) | output (visible) |
| cacheRead | cacheRead |
| cacheCreation | cacheCreation |
| NOT TRACKED | thinking tokens |
This makes it impossible to:
- Understand why usage limits are being consumed
- Build accurate cost models
- Optimize prompts based on actual consumption
- Create meaningful burn rate dashboards
Proposed Solution
Add type="thinking" to the existing token usage metric:
claude_code.token.usage{type="thinking", model="claude-opus-4-5-20251101"} 15000
This would appear alongside the existing types:
claude_code.token.usage{type="input", model="..."} 5000
claude_code.token.usage{type="output", model="..."} 2000
claude_code.token.usage{type="cacheRead", model="..."} 3000
claude_code.token.usage{type="cacheCreation", model="..."} 1000
claude_code.token.usage{type="thinking", model="..."} 15000 # NEW
Alternative Solutions
-
Include thinking in output - Less transparent, but simpler. Users would at least see accurate totals even if thinking isn't broken out.
-
Add thinking_tokens to log events - The
api_requestlog events could includethinking_tokensas an attribute (similar to existinginput_tokens,output_tokens). -
Document that thinking is excluded - At minimum, update the monitoring docs to warn users that OTEL metrics don't include thinking tokens and therefore undercount consumption.
Priority
High - Significant impact on productivity
Feature Category
Monitoring and telemetry
Additional Context
Related issues that mention thinking tokens / token visibility:
- #10388 - "Manual token counting doesn't account for thinking tokens"
- #5257 - MAX_THINKING_TOKENS behavior
- #777 - Agent token awareness
The API response from Claude already includes thinking token counts - this is about exposing that data through the OTEL pipeline.