agents icon indicating copy to clipboard operation
agents copied to clipboard

Prevent loss of final Whisper transcript when switching agents

Open MajorTal opened this issue 1 year ago • 1 comments

Problem: During an AgentSession hand-off the outgoing AgentActivity (and its RealtimeSession) is closed before the OpenAI Realtime server sends conversation.item.input_audio_transcription.completed for the user’s last utterance. If that packet arrives after the WebSocket has been closed, the user_input_transcribed event is never emitted, so applications lose the user’s final transcript (typical case: user says “I’m ready” just as IntroAgent hands off to MainAgent).

Solution:

  1. Start the next activity first AgentSession._update_activity_task now awaits next_activity.start() before it swaps _activity.
  2. Grace period before closing the old session Keep the previous activity alive for 1 second after the swap. This window is long enough for the Whisper pipeline to emit the final transcription event in normal network conditions.
  3. Existing behavior for the very first agent (prev_activity is None) remains unchanged.

Backwards compatibility The change only alters timing of internal cleanup; public API remains unchanged. Grace period is short (1 s) and only during hand-off, so resource usage impact is negligible.

MajorTal avatar Apr 25 '25 09:04 MajorTal

:warning: Changeset Required

We detected changes in the following package(s) but no changeset file was found. Please add one for proper versioning:

  • livekit-agents

👉 Create a changeset file by clicking here.

github-actions[bot] avatar Apr 25 '25 09:04 github-actions[bot]