Handle Gemini empty responses
Feature Type
Would make my life easier
Feature Description
We have observed that the Gemini API (via google-generativeai) occasionally returns a response with finish_reason=STOP but with empty text content and no function calls. This occurs even when the request appears valid.
In the current implementation of livekit.plugins.google.llm, such a response is processed as a valid chunk with empty content. This results in the LLMStream yielding an empty ChatChunk, which the agent considers a successful turn. Consequently, the agent "speaks" silence and the turn ends, preventing any FallbackLLM from triggering or any error handling from occurring. In our experience the Agent just stayed in silence until some other mechanism recognized the silence and fixed it.
Web Context
This appears to be a known behavior or intermittent issue with Gemini models, where the model returns a "completed" status without generating content.
- Related discussions indicate this can happen due to internal model behaviors or safety filters that don't correctly populate the safety ratings but still stop generation.
- Examples of users encountering empty responses with
finish_reason: STOP:- GitHub Issues Examples (General search for "empty response")
Workarounds / Alternatives
We detect this specific condition: finish_reason is STOP, but the content is empty and no prior content has been yielded and treat it as an error (e.g., APIStatusError). This will allow the FallbackLLM to catch the error and switch to a backup model, or allow the application to handle the failure gracefully.
Reference Implementation
Below is the logic we are currently using as a monkey patch to resolve this issue.
1. Helper Function
This function checks if the response should be considered "blocked" or empty.
from google.genai import types
def _is_conversation_blocked(
parts: list[types.Part],
finish_reason: types.FinishReason,
request_id: str,
has_yielded_content: bool,
) -> bool:
"""Return True when the chunk implies a blocked/empty response that should be treated as an error."""
# If we've already yielded content, this might just be the final stream signal.
if has_yielded_content:
return False
# We are looking for a specific case: Single part, STOP reason, but no text/function.
if len(parts) != 1:
return False
if finish_reason != types.FinishReason.STOP:
return False
part = parts[0]
if part.function_call or part.function_response:
return False
# If text is missing or empty, it's a blocked/empty response.
if not part.text:
return True
return False
2. Modified _run Loop
In LLMStream._run, we track has_yielded_content and apply the check.
# ... inside LLMStream._run ...
has_yielded_content = False # Track if we have sent any chunks
async for response in stream:
# ... existing error checks ...
# --- START CHANGE ---
# Check for empty STOP response
if response.candidates:
candidate = response.candidates[0]
finish_reason = candidate.finish_reason
parts = candidate.content.parts if candidate.content else []
if _is_conversation_blocked(parts, finish_reason, request_id, has_yielded_content):
raise APIStatusError(
"google llm: empty response without content",
retryable=False, # Or True, depending on desired behavior
request_id=request_id,
)
# --- END CHANGE ---
for part in response.candidates[0].content.parts:
chat_chunk = self._parse_part(request_id, part)
if chat_chunk is not None:
retryable = False
has_yielded_content = True # Mark that we have content
self._event_ch.send_nowait(chat_chunk)
# ... rest of loop ...
Key Differences from Current Implementation
-
State Tracking: The patch introduces
has_yielded_contentto distinguish between a truly empty response and the end of a successful stream. -
Validation: It explicitly validates that a
STOPresponse contains actual text or function calls before processing it. -
Error Raising: Instead of silently succeeding with an empty chunk, it raises
APIStatusError, enabling fallback mechanisms.
Additional Context
I'm sharing this workaround because I've received questions about it. But this is, of course, not an official solution, just something that works for us and might help somebody else.
Hi, thanks for the detailed report! When does this empty response typically occur, and do you have a reproducible code snippet? I'm wondering if this empty bit comes when you call generate_reply() with any particular configs in the LLM
@tinalenguyen I've seen it happen with the regular and pro models on different conversations. But I don't have a repeatable setup to predictably trigger it. So hard to say. I've seen it happen with vertex and google ai studio too. Even on third party providers like t3.chat. Seems a bit too random for now.
This is happening near constantly for us and causing lots of user facing issues. Its increased in severity on the Google AI Studio APIs since the release of Gemini 3.0 (we're using 2.5 flash).
Any kind of retry on this error would be a massive improvement over what we're dealing with now @tinalenguyen
I've noticed it happens particularly often when the model attempts tool calls.
@alexlooney @macastro9714 if you are able to repro this, could you give #4249 a try?