codex icon indicating copy to clipboard operation
codex copied to clipboard

feat: add token counts, timestamps, and model to rollouts

Open bl-ue opened this issue 6 months ago • 10 comments

This PR adds token count and selected model information to thinking blocks, text blocks and tool call blocks in the rollout files. This is needed for tools to be able to analyze Codex CLI usage programmatically. Because Codex CLI can work with any OpenAI-compatible provider and different models use different tokenizers, attempting to count the tokens of the user/AI messages in the rollout files externally is impractical. In addition, the number of cached tokens, which affects the reported cost, cannot be computed given just the message content. As most OpenAI-compatible providers return token usage directly with each generation request, including that token usage information in the rollout files is trivial

Codex support in Splitrail is now implemented and waiting for this PR.

Closes #1572

bl-ue avatar Jul 15 '25 15:07 bl-ue

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

github-actions[bot] avatar Jul 15 '25 15:07 github-actions[bot]

I have read the CLA Document and I hereby sign the CLA

bl-ue avatar Jul 15 '25 15:07 bl-ue

Ready for review.

bl-ue avatar Jul 17 '25 20:07 bl-ue

Hi @bolinfest! This PR adds token, timestamp, and model information to rollout files. This enables our new tool, Splitrail, to track token usage, cost, and throughput for Codex users. It also makes it easier for other tools to do the same. Do you mind taking a look? Thank you!

bl-ue avatar Jul 21 '25 16:07 bl-ue

cc @aibrahim-oai

bl-ue avatar Jul 26 '25 15:07 bl-ue

Tools can already calculate tokens used from the rollout info. I don't think there is much benefit in adding them in response items. What is the use?

aibrahim-oai avatar Jul 26 '25 19:07 aibrahim-oai

Hi @aibrahim-oai! Thank you for reviewing so quickly. Yes, that makes sense, but automatic input caching makes accurate calculations impossible. When input exceeds 1024 tokens, inputs are automatically cached (see here), so there's no way to determine which part of the input was cached and which wasn't; therefore, we can't calculate cost accurately.

In addition, rollouts don't currently store model information, so it's not possible to determine which tokenizer to use, nor can we determine model/token cost for accurate cost calculation. This PR stores model information to address this.

Last but not least, Codex can be used with custom providers, and it would be difficult to perform tokenization on rollouts that use custom models/providers; with open-source models in Ollama, you'd have to download and use a tokenizer, and with providers that don't distribute their tokenizers, you'd have to use an API.

bl-ue avatar Jul 26 '25 20:07 bl-ue

Hi @bolinfest, @gpeal, and @pakrym-oai! This PR enhances rollouts with token counts, timestamps, and model info. We're using this data to calculate usage along with Claude Code and Gemini CLI in Splitrail. With GPT-5, we expect Codex CLI usage to increase. Is there any chance you guys could take a look this week?

bl-ue avatar Aug 10 '25 22:08 bl-ue

can we get this back?

guywilsonjr avatar Aug 17 '25 00:08 guywilsonjr

Hi @dylan-hurd-oai @easong-openai @jif-oai @pap-openai @aibrahim-oai @bolinfest! Is there any chance this PR can be merged? Thank you!

bl-ue avatar Sep 07 '25 23:09 bl-ue