Prevent loss of final Whisper transcript when switching agents
Problem: During an AgentSession hand-off the outgoing AgentActivity (and its RealtimeSession) is closed before the OpenAI Realtime server sends conversation.item.input_audio_transcription.completed for the user’s last utterance. If that packet arrives after the WebSocket has been closed, the user_input_transcribed event is never emitted, so applications lose the user’s final transcript (typical case: user says “I’m ready” just as IntroAgent hands off to MainAgent).
Solution:
- Start the next activity first AgentSession._update_activity_task now awaits next_activity.start() before it swaps _activity.
- Grace period before closing the old session Keep the previous activity alive for 1 second after the swap. This window is long enough for the Whisper pipeline to emit the final transcription event in normal network conditions.
- Existing behavior for the very first agent (prev_activity is None) remains unchanged.
Backwards compatibility The change only alters timing of internal cleanup; public API remains unchanged. Grace period is short (1 s) and only during hand-off, so resource usage impact is negligible.
:warning: Changeset Required
We detected changes in the following package(s) but no changeset file was found. Please add one for proper versioning:
-
livekit-agents
👉 Create a changeset file by clicking here.