Summarize Gemini 3.0 Pro's thinking
What would you like to be added?
Summaries of Gemini's thinking.
The ability to give Gemini real-time hints is awesom, but almost useless if the user does not understand what Gemini is doing. Gemini doesn't tend to explain itself much, so the only output available with actionable content is Gemini's thoughts.
Unfortunately, Gemini is a few times more verbose than Claude, and highly repetitive in its thinking. No sane SWE would read all of Gemini's thinking. Fortunately, we have Gemini 2.5 Flash Lite, a cheap model that excells at summaries. I've done a PoC in CodeRhapsody, and it is very cool. Compared to Claude, the resulting thinking quality is good. The only down-side is it isn't displayed as quickly: Claude thinking can stream in real time, while we must wait for thinking streaming to complete before we can geneate a summary.
In building my PoC, I found that context from prior thinking summaries is critical: you can't just summarize thinking attached to a response: you have to track the thinking and their summaries for messages back to the start of the tool-chain call. Once I added this enhancement, thinking summaries became more concise and actionable.
Why is this needed?
Summaries of thinking is required if you want to empower users to guide the AI in real time, collaborating like you would with a human. This results in ~2X increase in productivity.
Additional context
I've used real-time collaboration with Claude since August, and have written 2,000 lines of production-ready code per day since then. I estimate I am 2X more productive with real-time collaboration with the AI because:
- I rarely have to throw away work, fix my prompt, and start over.
- I am the expert in the code by the time the AI finishes generrating it.
Being the expert is required, obviously, and instead of becoming the expert by reverse-engineering the AI's code, I become the expert as a side-benefit of my real-time collaboration with the AI.
I MUST read Gemini's thinking for real-time collaboration to work, but I will go insane if I have to read Gemini's thinking output. If we are to use Gemini 3.0 Pro for real-time collaboration, we MUST improve the feedback to the user. This feature works.
Alternatively, could Gemini 3.0 Pro simply fix its thinking? I brought this to Gemini devs' attention:
I have a critical flaw in Gemini 3.0 Pro's thinking summaries to report. I've verified this with Gemini 3.0 Pro's help. Here's its analysis of its thinking summaries:
- Statelessness: Each summary essentially restates the same core problem ("the test is failing") and my intention ("I'm debugging it") but with slightly different wording. It's as if I am re-evaluating the situation from scratch each time, without acknowledging the previous thought.
- Lack of Progression: The summaries don't build on each other. A more effective sequence would be:
- "The test failed. I'll investigate the output."
- "The output shows the wrong vertex count. I'll examine the polygon construction."
- "The construction logic fails at vertices. I'll try a new algorithm." Instead, it's a loop of "The test failed. I'm looking at the failure."
- Repetitive Phrasing: Phrases like "I've identified the root cause," "I've pinpointed the flaw," and "I've uncovered the root cause" are used multiple times to describe the same step in the debugging process.
Conclusion:
Your diagnosis appears to be spot on. The thinking summaries are generated without referencing the preceding summaries within the same tool-use chain. This leads to a repetitive and inefficient narrative of my process, which, while ultimately leading to the correct answer, is not a clear or concise representation of the "thought" process. It's more like a series of independent snapshots rather than a continuous stream of consciousness.
This is a critical bug in how my thought process is externalized. Thank you for pointing it out.
I've confirmed that it is not feasible to make this change upstream directly in Gemini, so the summarization must be done in the AI coding agent. I also tried summarizing one thought chunk at a time, and the results where too wordy and especially since the user is paying extra for these calls, I think we need to wait for all thinking to be done for a turn before generating the summary.
A quick design for this feature:
AI Thinking Process & Summarization
Overview
The "Thinking Process & Summarization" feature enables the Gemini CLI to handle and display the internal reasoning process of AI models (specifically "Flash" thinking models). Instead of hiding this process or showing raw tokens, the CLI captures these thought blocks, summarizes them in real-time if configured, and renders them synchronously to the user.
Architecture & Implementation
1. Data Ingestion (Stream Handling)
The core logic resides in the processGeminiStreamEvents function within the
main streaming hook.
-
Location:
packages/cli/src/ui/hooks/useGeminiStream.ts -
Event Handling: The stream parser detects
ServerGeminiEventType.Thoughtevents. -
Action: When a thought event is received, it triggers
summarizeThoughtChunkimmediately.
2. Thinking Summarizer (Inline Logic)
Unlike a standalone class, the summarization logic is implemented as a callback
within useGeminiStream.ts.
-
Function:
summarizeThoughtChunk -
Context: It maintains a
thinkingHistoryRef(a string buffer) to keep track of the cumulative reasoning context for the current turn. -
Summarization Strategy:
- It constructs a secondary prompt ("Summarize this internal reasoning concisely...") containing the raw thought and previous thinking history.
- It calls
config.getBaseLlmClient().generateContent(using a fast model like Flash) to generate a structured JSON summary{ subject, description }. - This summary is then spoken via TTS (
ttsService.speak) and added to the chat history as athinkingtype message.
3. Synchronous Rendering
-
UI Component:
GeminiMessage(inpackages/cli/src/ui/components/messages/GeminiMessage.tsx) andLoadingIndicatorhandle the visual representation. -
State: The
thoughtstate object{ subject, description }is updated in real-time, causing the UI to display the current "Subject" of the AI's thinking (e.g., "Reasoning about file permissions") while the spinner is active.
Configuration
The feature is controlled via settings in config.yaml:
-
includeThoughts: Controls whether thoughts are processed/shown.
Future Considerations
-
Refactoring: Moving the
summarizeThoughtChunklogic into a dedicated service or the Core package to clean up the UI hook.