[FEATURE] Add thinking tokens to OTEL token_usage metric

Open JuanjoFuchs opened this issue 1 month ago • 0 comments

Preflight Checklist

[x] I have searched existing requests and this feature hasn't been requested yet
[x] This is a single feature request (not multiple features)

Problem Statement

The claude_code.token.usage metric currently tracks four token types:

input
output
cacheRead
cacheCreation

Missing: thinking tokens.

Thinking tokens (extended thinking / internal reasoning) can be 3-10x the visible output tokens and count fully against usage limits. Without tracking them, OTEL metrics significantly undercount actual token consumption.

From Anthropic's own documentation:

"Both the invisible thinking tokens AND the visible response are billed as output tokens. Even worse, they count fully against your usage limits."

Impact

I built a correlation analysis between OTEL metrics and usage limit burn rate. All correlations were weak (|r| < 0.3) because OTEL is missing a major factor.

What OTEL Tracks	What Counts Toward Limits
input	input
output (visible only)	output (visible)
cacheRead	cacheRead
cacheCreation	cacheCreation
NOT TRACKED	thinking tokens

This makes it impossible to:

Understand why usage limits are being consumed
Build accurate cost models
Optimize prompts based on actual consumption
Create meaningful burn rate dashboards

Proposed Solution

Add type="thinking" to the existing token usage metric:

claude_code.token.usage{type="thinking", model="claude-opus-4-5-20251101"} 15000

This would appear alongside the existing types:

claude_code.token.usage{type="input", model="..."} 5000
claude_code.token.usage{type="output", model="..."} 2000
claude_code.token.usage{type="cacheRead", model="..."} 3000
claude_code.token.usage{type="cacheCreation", model="..."} 1000
claude_code.token.usage{type="thinking", model="..."} 15000  # NEW

Alternative Solutions

Include thinking in output - Less transparent, but simpler. Users would at least see accurate totals even if thinking isn't broken out.
Add thinking_tokens to log events - The api_request log events could include thinking_tokens as an attribute (similar to existing input_tokens, output_tokens).
Document that thinking is excluded - At minimum, update the monitoring docs to warn users that OTEL metrics don't include thinking tokens and therefore undercount consumption.

Priority

High - Significant impact on productivity

Feature Category

Monitoring and telemetry

Additional Context

Related issues that mention thinking tokens / token visibility:

#10388 - "Manual token counting doesn't account for thinking tokens"
#5257 - MAX_THINKING_TOKENS behavior
#777 - Agent token awareness

The API response from Claude already includes thinking token counts - this is about exposing that data through the OTEL pipeline.

References

Jan 08 '26 21:01 JuanjoFuchs