agents icon indicating copy to clipboard operation
agents copied to clipboard

min_endpointing_delay parameter doesn't work as expected

Open samirsalman opened this issue 1 year ago โ€ข 5 comments

Issue Summary

When initializing the VoiceAssistant with min_endpointing_delay=5, the assistant does not wait 5 seconds before sending the transcription to the LLM. This results in multiple requests being sent when the user makes brief pauses during speech, causing slower inference and incomplete responses.

According to the documentation, this parameter is described as:
โ€œDelay to wait before considering the user finished speaking.โ€

Expected Behavior

The assistant should wait for the duration specified by min_endpointing_delay before sending the transcription to the LLM, assuming the user has finished speaking.
The intended flow should be:

  • User speaks
  • min_endpointing_delay seconds of silence
  • User is considered to have stopped speaking
  • Transcription is sent to the LLM
  • The LLM response is synthesized and returned via TTS

If this is not the intended behavior of the min_endpointing_delay parameter, the description should be updated for clarity. Additionally, is there a way to implement this behavior if it isn't currently supported?

Current Behavior

The assistant sends the transcription to the LLM immediately upon the user's first silence, without waiting for the duration specified by min_endpointing_delay.

Steps to Reproduce

  1. Initialize the VoiceAssistant with min_endpointing_delay set to a large value (e.g., 10 seconds).
  2. Start the assistant.
  3. Speak into the assistant.
  4. Observe from the logs that the LLM receives the input immediately, without waiting the specified delay.
  5. [Optional] Use the before_llm_cb parameter to log information before sending input to the LLM. You will see that this callback is triggered immediately, without waiting the full 10 seconds.

Environment

  • Python 3.10.14
  • Packages:
    • livekit==0.17.0
    • livekit-agents==0.9.1
    • livekit-api==0.7.0
    • livekit-plugins-azure==0.3.2
    • livekit-plugins-deepgram==0.6.7
    • livekit-plugins-nltk==0.7.1
    • livekit-plugins-openai==0.8.5
    • livekit-plugins-silero==0.6.4
    • livekit-protocol==0.6.0

samirsalman avatar Sep 28 '24 10:09 samirsalman

+1, We are getting this problem where agent is speaking before human ends their conversation (they are thinking, thus speaking slowly), It is very annoying as agent interrupts and start speaking at incomplete breaks.

hari01584 avatar Oct 01 '24 09:10 hari01584

+1

SimoneFaricelli avatar Oct 01 '24 12:10 SimoneFaricelli

@hari01584 have you tried increasing min_endpointing_delay ?

davidzhao avatar Oct 02 '24 05:10 davidzhao

TypeError: VoiceAssistant.init() got an unexpected keyword argument 'min_endpointing_delay'

Aniket-think41 avatar Oct 07 '24 08:10 Aniket-think41

@davidzhao up

samirsalman avatar Oct 09 '24 13:10 samirsalman

Hey, this was a behavior of premptive_synthesis and it is now disabled by default on livekit-agents==0.10.1. You shouldn't see this behavior when preemptive_synthesis=False

theomonnom avatar Oct 10 '24 00:10 theomonnom