agents Handle Gemini empty responses

Feature Type

Would make my life easier

Feature Description

We have observed that the Gemini API (via google-generativeai) occasionally returns a response with finish_reason=STOP but with empty text content and no function calls. This occurs even when the request appears valid.

In the current implementation of livekit.plugins.google.llm, such a response is processed as a valid chunk with empty content. This results in the LLMStream yielding an empty ChatChunk, which the agent considers a successful turn. Consequently, the agent "speaks" silence and the turn ends, preventing any FallbackLLM from triggering or any error handling from occurring. In our experience the Agent just stayed in silence until some other mechanism recognized the silence and fixed it.

Web Context

This appears to be a known behavior or intermittent issue with Gemini models, where the model returns a "completed" status without generating content.

Related discussions indicate this can happen due to internal model behaviors or safety filters that don't correctly populate the safety ratings but still stop generation.
Examples of users encountering empty responses with finish_reason: STOP:
- GitHub Issues Examples (General search for "empty response")

Workarounds / Alternatives

We detect this specific condition: finish_reason is STOP, but the content is empty and no prior content has been yielded and treat it as an error (e.g., APIStatusError). This will allow the FallbackLLM to catch the error and switch to a backup model, or allow the application to handle the failure gracefully.

Reference Implementation

Below is the logic we are currently using as a monkey patch to resolve this issue.

1. Helper Function

This function checks if the response should be considered "blocked" or empty.

from google.genai import types

def _is_conversation_blocked(
    parts: list[types.Part],
    finish_reason: types.FinishReason,
    request_id: str,
    has_yielded_content: bool,
) -> bool:
    """Return True when the chunk implies a blocked/empty response that should be treated as an error."""

    # If we've already yielded content, this might just be the final stream signal.
    if has_yielded_content:
        return False

    # We are looking for a specific case: Single part, STOP reason, but no text/function.
    if len(parts) != 1:
        return False

    if finish_reason != types.FinishReason.STOP:
        return False

    part = parts[0]

    if part.function_call or part.function_response:
        return False

    # If text is missing or empty, it's a blocked/empty response.
    if not part.text:
        return True

    return False

2. Modified `_run` Loop

In LLMStream._run, we track has_yielded_content and apply the check.

# ... inside LLMStream._run ...
            has_yielded_content = False # Track if we have sent any chunks

            async for response in stream:
                # ... existing error checks ...

                # --- START CHANGE ---
                # Check for empty STOP response
                if response.candidates:
                    candidate = response.candidates[0]
                    finish_reason = candidate.finish_reason
                    parts = candidate.content.parts if candidate.content else []

                    if _is_conversation_blocked(parts, finish_reason, request_id, has_yielded_content):
                        raise APIStatusError(
                            "google llm: empty response without content",
                            retryable=False, # Or True, depending on desired behavior
                            request_id=request_id,
                        )
                # --- END CHANGE ---

                for part in response.candidates[0].content.parts:
                    chat_chunk = self._parse_part(request_id, part)
                    if chat_chunk is not None:
                        retryable = False
                        has_yielded_content = True # Mark that we have content
                        self._event_ch.send_nowait(chat_chunk)

                # ... rest of loop ...

Key Differences from Current Implementation

State Tracking: The patch introduces has_yielded_content to distinguish between a truly empty response and the end of a successful stream.
Validation: It explicitly validates that a STOP response contains actual text or function calls before processing it.
Error Raising: Instead of silently succeeding with an empty chunk, it raises APIStatusError, enabling fallback mechanisms.

Additional Context

I'm sharing this workaround because I've received questions about it. But this is, of course, not an official solution, just something that works for us and might help somebody else.

Nov 24 '25 17:11 macastro9714

Hi, thanks for the detailed report! When does this empty response typically occur, and do you have a reproducible code snippet? I'm wondering if this empty bit comes when you call generate_reply() with any particular configs in the LLM

possible related issue

Nov 24 '25 20:11 tinalenguyen

@tinalenguyen I've seen it happen with the regular and pro models on different conversations. But I don't have a repeatable setup to predictably trigger it. So hard to say. I've seen it happen with vertex and google ai studio too. Even on third party providers like t3.chat. Seems a bit too random for now.

Nov 25 '25 17:11 macastro9714

This is happening near constantly for us and causing lots of user facing issues. Its increased in severity on the Google AI Studio APIs since the release of Gemini 3.0 (we're using 2.5 flash).

Any kind of retry on this error would be a massive improvement over what we're dealing with now @tinalenguyen

I've noticed it happens particularly often when the model attempts tool calls.

Dec 10 '25 17:12 alexlooney

@alexlooney @macastro9714 if you are able to repro this, could you give #4249 a try?

Dec 14 '25 08:12 davidzhao

Handle Gemini empty responses

Feature Type

Feature Description

Web Context

Workarounds / Alternatives

Reference Implementation

1. Helper Function

2. Modified _run Loop

Key Differences from Current Implementation

Additional Context

2. Modified `_run` Loop