claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[FEATURE] Add thinking tokens to OTEL token_usage metric

Open JuanjoFuchs opened this issue 1 month ago • 0 comments

Preflight Checklist

  • [x] I have searched existing requests and this feature hasn't been requested yet
  • [x] This is a single feature request (not multiple features)

Problem Statement

The claude_code.token.usage metric currently tracks four token types:

  • input
  • output
  • cacheRead
  • cacheCreation

Missing: thinking tokens.

Thinking tokens (extended thinking / internal reasoning) can be 3-10x the visible output tokens and count fully against usage limits. Without tracking them, OTEL metrics significantly undercount actual token consumption.

From Anthropic's own documentation:

"Both the invisible thinking tokens AND the visible response are billed as output tokens. Even worse, they count fully against your usage limits."

Impact

I built a correlation analysis between OTEL metrics and usage limit burn rate. All correlations were weak (|r| < 0.3) because OTEL is missing a major factor.

What OTEL Tracks What Counts Toward Limits
input input
output (visible only) output (visible)
cacheRead cacheRead
cacheCreation cacheCreation
NOT TRACKED thinking tokens

This makes it impossible to:

  • Understand why usage limits are being consumed
  • Build accurate cost models
  • Optimize prompts based on actual consumption
  • Create meaningful burn rate dashboards

Proposed Solution

Add type="thinking" to the existing token usage metric:

claude_code.token.usage{type="thinking", model="claude-opus-4-5-20251101"} 15000

This would appear alongside the existing types:

claude_code.token.usage{type="input", model="..."} 5000
claude_code.token.usage{type="output", model="..."} 2000
claude_code.token.usage{type="cacheRead", model="..."} 3000
claude_code.token.usage{type="cacheCreation", model="..."} 1000
claude_code.token.usage{type="thinking", model="..."} 15000  # NEW

Alternative Solutions

  1. Include thinking in output - Less transparent, but simpler. Users would at least see accurate totals even if thinking isn't broken out.

  2. Add thinking_tokens to log events - The api_request log events could include thinking_tokens as an attribute (similar to existing input_tokens, output_tokens).

  3. Document that thinking is excluded - At minimum, update the monitoring docs to warn users that OTEL metrics don't include thinking tokens and therefore undercount consumption.

Priority

High - Significant impact on productivity

Feature Category

Monitoring and telemetry

Additional Context

Related issues that mention thinking tokens / token visibility:

  • #10388 - "Manual token counting doesn't account for thinking tokens"
  • #5257 - MAX_THINKING_TOKENS behavior
  • #777 - Agent token awareness

The API response from Claude already includes thinking token counts - this is about exposing that data through the OTEL pipeline.

References

JuanjoFuchs avatar Jan 08 '26 21:01 JuanjoFuchs