Auto-completion keeps requesting gemini-2.5-flash-lite even when using gemini-3.0, causing infinite "usage limit reached" warnings
What happened?
I'm using gemini-3.0 as my main model.
However, when auto-completion is enabled, I repeatedly get this warning:
"Usage limit reached for gemini-2.5-flash-lite"
Even though:
- I am not using
gemini-2.5-flash-liteanywhere in my code. - My
gemini-3.0quota is still available. - The warning keeps looping even when I’m not making any manual requests.
My Guess (Possible Cause)
It seems that the auto-completion feature internally uses gemini-2.5-flash-lite to generate inline suggestions.
When the quota for gemini-2.5-flash-lite is exhausted:
- auto-completion keeps trying to request that model,
- each request fails,
- the app continues retrying automatically,
- resulting in an endless loop of warnings.
What did you expect to happen?
- Auto-completion should stop making requests once the quota for its internal model is exceeded.
- Allow us to configure which model auto-completion uses.
- OR automatically disable auto-completion when its model quota is exceeded.
Client information
About Gemini CLI
CLI Version 0.19.1
Git Commit 6169ef04b
Model auto
Sandbox no sandbox
OS linux
Auth Method OAuth
User Email [email protected]
GCP Project key-prism-479308-n7
Login information
Google account
Anything else we need to know?
- Use
gemini-3.0as the main model. - Enable auto-completion.
- Exhaust quota for
gemini-2.5-flash-lite. - Observe continuous warnings even though you are not using that model directly.
Yup, confirmed and very embarrasing 👍
Pls fix/merge the PR. As it also ignore the disable prompt completion setting.
Please increase the priority of this issue, as it severely impacts the user experience!!