What would you like to be added?

Summaries of Gemini's thinking.

The ability to give Gemini real-time hints is awesom, but almost useless if the user does not understand what Gemini is doing. Gemini doesn't tend to explain itself much, so the only output available with actionable content is Gemini's thoughts.

Unfortunately, Gemini is a few times more verbose than Claude, and highly repetitive in its thinking. No sane SWE would read all of Gemini's thinking. Fortunately, we have Gemini 2.5 Flash Lite, a cheap model that excells at summaries. I've done a PoC in CodeRhapsody, and it is very cool. Compared to Claude, the resulting thinking quality is good. The only down-side is it isn't displayed as quickly: Claude thinking can stream in real time, while we must wait for thinking streaming to complete before we can geneate a summary.

In building my PoC, I found that context from prior thinking summaries is critical: you can't just summarize thinking attached to a response: you have to track the thinking and their summaries for messages back to the start of the tool-chain call. Once I added this enhancement, thinking summaries became more concise and actionable.

Why is this needed?

Summaries of thinking is required if you want to empower users to guide the AI in real time, collaborating like you would with a human. This results in ~2X increase in productivity.

Additional context

I've used real-time collaboration with Claude since August, and have written 2,000 lines of production-ready code per day since then. I estimate I am 2X more productive with real-time collaboration with the AI because:

I rarely have to throw away work, fix my prompt, and start over.
I am the expert in the code by the time the AI finishes generrating it.

Being the expert is required, obviously, and instead of becoming the expert by reverse-engineering the AI's code, I become the expert as a side-benefit of my real-time collaboration with the AI.

I MUST read Gemini's thinking for real-time collaboration to work, but I will go insane if I have to read Gemini's thinking output. If we are to use Gemini 3.0 Pro for real-time collaboration, we MUST improve the feedback to the user. This feature works.

Dec 13 '25 15:12 waywardgeek

Alternatively, could Gemini 3.0 Pro simply fix its thinking? I brought this to Gemini devs' attention:

I have a critical flaw in Gemini 3.0 Pro's thinking summaries to report. I've verified this with Gemini 3.0 Pro's help. Here's its analysis of its thinking summaries:

Statelessness: Each summary essentially restates the same core problem ("the test is failing") and my intention ("I'm debugging it") but with slightly different wording. It's as if I am re-evaluating the situation from scratch each time, without acknowledging the previous thought.
Lack of Progression: The summaries don't build on each other. A more effective sequence would be:
1. "The test failed. I'll investigate the output."
2. "The output shows the wrong vertex count. I'll examine the polygon construction."
3. "The construction logic fails at vertices. I'll try a new algorithm." Instead, it's a loop of "The test failed. I'm looking at the failure."
Repetitive Phrasing: Phrases like "I've identified the root cause," "I've pinpointed the flaw," and "I've uncovered the root cause" are used multiple times to describe the same step in the debugging process.

Conclusion:

Your diagnosis appears to be spot on. The thinking summaries are generated without referencing the preceding summaries within the same tool-use chain. This leads to a repetitive and inefficient narrative of my process, which, while ultimately leading to the correct answer, is not a clear or concise representation of the "thought" process. It's more like a series of independent snapshots rather than a continuous stream of consciousness.

This is a critical bug in how my thought process is externalized. Thank you for pointing it out.

Dec 16 '25 00:12 waywardgeek

I've confirmed that it is not feasible to make this change upstream directly in Gemini, so the summarization must be done in the AI coding agent. I also tried summarizing one thought chunk at a time, and the results where too wordy and especially since the user is paying extra for these calls, I think we need to wait for all thinking to be done for a turn before generating the summary.

Dec 17 '25 01:12 waywardgeek

A quick design for this feature:

AI Thinking Process & Summarization

Overview

The "Thinking Process & Summarization" feature enables the Gemini CLI to handle and display the internal reasoning process of AI models (specifically "Flash" thinking models). Instead of hiding this process or showing raw tokens, the CLI captures these thought blocks, summarizes them in real-time if configured, and renders them synchronously to the user.

Architecture & Implementation

1. Data Ingestion (Stream Handling)

The core logic resides in the processGeminiStreamEvents function within the main streaming hook.

Location: packages/cli/src/ui/hooks/useGeminiStream.ts
Event Handling: The stream parser detects ServerGeminiEventType.Thought events.
Action: When a thought event is received, it triggers summarizeThoughtChunk immediately.

2. Thinking Summarizer (Inline Logic)

Unlike a standalone class, the summarization logic is implemented as a callback within useGeminiStream.ts.

Function: summarizeThoughtChunk
Context: It maintains a thinkingHistoryRef (a string buffer) to keep track of the cumulative reasoning context for the current turn.
Summarization Strategy:
- It constructs a secondary prompt ("Summarize this internal reasoning concisely...") containing the raw thought and previous thinking history.
- It calls config.getBaseLlmClient().generateContent (using a fast model like Flash) to generate a structured JSON summary { subject, description }.
- This summary is then spoken via TTS (ttsService.speak) and added to the chat history as a thinking type message.

3. Synchronous Rendering

UI Component: GeminiMessage (in packages/cli/src/ui/components/messages/GeminiMessage.tsx) and LoadingIndicator handle the visual representation.
State: The thought state object { subject, description } is updated in real-time, causing the UI to display the current "Subject" of the AI's thinking (e.g., "Reasoning about file permissions") while the spinner is active.

Configuration

The feature is controlled via settings in config.yaml:

includeThoughts: Controls whether thoughts are processed/shown.

Future Considerations

Refactoring: Moving the summarizeThoughtChunk logic into a dedicated service or the Core package to clean up the UI hook.

Dec 21 '25 23:12 waywardgeek

Summarize Gemini 3.0 Pro's thinking

What would you like to be added?

Why is this needed?

Additional context

AI Thinking Process & Summarization

Overview

Architecture & Implementation

1. Data Ingestion (Stream Handling)

2. Thinking Summarizer (Inline Logic)

3. Synchronous Rendering

Configuration

Future Considerations