gemini-cli Auto-completion keeps requesting gemini-2.5-flash-lite even when using gemini-3.0, causing infinite "usage limit reached" warnings

What happened?

I'm using gemini-3.0 as my main model.
However, when auto-completion is enabled, I repeatedly get this warning:

"Usage limit reached for gemini-2.5-flash-lite"

Even though:

I am not using gemini-2.5-flash-lite anywhere in my code.
My gemini-3.0 quota is still available.
The warning keeps looping even when I’m not making any manual requests.

My Guess (Possible Cause)

It seems that the auto-completion feature internally uses gemini-2.5-flash-lite to generate inline suggestions.

When the quota for gemini-2.5-flash-lite is exhausted:

auto-completion keeps trying to request that model,
each request fails,
the app continues retrying automatically,
resulting in an endless loop of warnings.

What did you expect to happen?

Auto-completion should stop making requests once the quota for its internal model is exceeded.
Allow us to configure which model auto-completion uses.
OR automatically disable auto-completion when its model quota is exceeded.

Client information

About Gemini CLI

CLI Version 0.19.1
Git Commit 6169ef04b
Model auto
Sandbox no sandbox
OS linux
Auth Method OAuth
User Email [email protected]
GCP Project key-prism-479308-n7

Login information

Google account

Anything else we need to know?

Use gemini-3.0 as the main model.
Enable auto-completion.
Exhaust quota for gemini-2.5-flash-lite.
Observe continuous warnings even though you are not using that model directly.

Dec 11 '25 07:12 radithyaa

Yup, confirmed and very embarrasing 👍

Dec 11 '25 20:12 adriens

Pls fix/merge the PR. As it also ignore the disable prompt completion setting.

Dec 19 '25 08:12 MeGaNeKoS

Please increase the priority of this issue, as it severely impacts the user experience!!

Dec 24 '25 16:12 Shallow-dusty