min_endpointing_delay parameter doesn't work as expected
Issue Summary
When initializing the VoiceAssistant with min_endpointing_delay=5, the assistant does not wait 5 seconds before sending the transcription to the LLM. This results in multiple requests being sent when the user makes brief pauses during speech, causing slower inference and incomplete responses.
According to the documentation, this parameter is described as:
โDelay to wait before considering the user finished speaking.โ
Expected Behavior
The assistant should wait for the duration specified by min_endpointing_delay before sending the transcription to the LLM, assuming the user has finished speaking.
The intended flow should be:
- User speaks
-
min_endpointing_delayseconds of silence - User is considered to have stopped speaking
- Transcription is sent to the LLM
- The LLM response is synthesized and returned via TTS
If this is not the intended behavior of the min_endpointing_delay parameter, the description should be updated for clarity. Additionally, is there a way to implement this behavior if it isn't currently supported?
Current Behavior
The assistant sends the transcription to the LLM immediately upon the user's first silence, without waiting for the duration specified by min_endpointing_delay.
Steps to Reproduce
- Initialize the
VoiceAssistantwithmin_endpointing_delayset to a large value (e.g., 10 seconds). - Start the assistant.
- Speak into the assistant.
- Observe from the logs that the LLM receives the input immediately, without waiting the specified delay.
- [Optional] Use the
before_llm_cbparameter to log information before sending input to the LLM. You will see that this callback is triggered immediately, without waiting the full 10 seconds.
Environment
- Python 3.10.14
- Packages:
-
livekit==0.17.0 -
livekit-agents==0.9.1 -
livekit-api==0.7.0 -
livekit-plugins-azure==0.3.2 -
livekit-plugins-deepgram==0.6.7 -
livekit-plugins-nltk==0.7.1 -
livekit-plugins-openai==0.8.5 -
livekit-plugins-silero==0.6.4 -
livekit-protocol==0.6.0
-
+1, We are getting this problem where agent is speaking before human ends their conversation (they are thinking, thus speaking slowly), It is very annoying as agent interrupts and start speaking at incomplete breaks.
+1
@hari01584 have you tried increasing min_endpointing_delay ?
TypeError: VoiceAssistant.init() got an unexpected keyword argument 'min_endpointing_delay'
@davidzhao up
Hey, this was a behavior of premptive_synthesis and it is now disabled by default on livekit-agents==0.10.1. You shouldn't see this behavior when preemptive_synthesis=False