flutterfire 🐛 [firebase_vertexai] Feature Request: Support thinkingBudget configuration for Gemini 2.5 Flash in Flutter

Hi team,

First of all, thank you for your amazing work on this library. I’d like to request support for a new feature introduced in the Gemini 2.5 Flash model — the ability to configure the internal "thinking" process via the thinkingBudget parameter.

As described in the official documentation, Gemini 2.5 Flash models support an internal reasoning process that can be tuned by setting the thinkingBudget (an integer between 0 and 24,576). This parameter gives the model guidance on how many tokens it can use internally to “think” before generating the final response.

Why is this important? This feature is crucial for advanced tasks such as:

Complex code generation

Multistep problem solving in math or logic

Structured data analysis and reasoning

Use cases where we want to trade off latency vs. reasoning depth

For example, setting thinkingBudget: 0 disables the internal reasoning (faster response), while higher values allow more in-depth reasoning (better quality for complex tasks).

What we need Please consider adding support for configuring the thinkingBudget parameter in the Flutter wrapper/library for Gemini. Ideally, it could be passed through a GenerateContentConfig (or similar config object), as is done in the Python API:

from google import genai
from google.genai import types

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-04-17",
    contents="Explain the Occam's Razor concept and provide everyday examples of it",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=1024)
    ),
)

Final thoughts Having access to this parameter in Flutter will enable developers to optimize Gemini's performance depending on their specific app needs. It’s especially valuable in educational, scientific, and reasoning-intensive applications.

Thanks again for your hard work! Looking forward to your feedback.

Best regards, Norberto

May 19 '25 19:05 nmarafo

I have implemented this in a fork so we could test some of the new models with thinking turned off. I didn't mirror the config options 1:1 (enable/disable thinking AND thinking budget), but effectively setting the thinkingBudget to 0 disabled thinking.

I mimicked the config nesting already in the package and the API turned out like this:

FirebaseAI.vertexAI(...).generativeModel(
      model: "gemini-2.5-flash-preview-04-17",
      generationConfig: GenerationConfig(
        ...,
        thinkingConfig: ThinkingConfig(
          thinkingBudget: 0,
        ),
      ),
    );

and adding the thinking token count on the UsageMetadata model:

  UsageMetadata._(
      {this.promptTokenCount,
      this.candidatesTokenCount,
      this.totalTokenCount,
      this.promptTokensDetails,
      this.candidatesTokensDetails,
      this.thoughtsTokenCount});

I am also waiting for implicit caching to turn on for vertex AI so I can add that to the UsageMetadata model as well. I am happy to refine this and put up a PR if that is something useful.

May 20 '25 18:05 davidpryor

link https://github.com/firebase/flutterfire/issues/17406

Jun 03 '25 17:06 cynthiajoan

Looks like they have added this two days ago to the firebase_ai package. https://github.com/firebase/flutterfire/commit/18f5614263750e350f549c077040335883fab0b3

I guess I will be able to retire my fork soon!

Jul 10 '25 19:07 davidpryor