agents icon indicating copy to clipboard operation
agents copied to clipboard

Interruptions at the beginning of the agent's response

Open CyprienRicqueB2L opened this issue 2 months ago • 3 comments

Feature Type

Would make my life easier

Feature Description

Sometime the agent interrupts the user. This is because while the user finished speaking a first time. He then proceed to say another sentence. In that case the first audio frames emitted by the agent interrupt the user. They mostly both start speaking at the same time.

To prevent this, I am thinking we could have the VAD be triggered right before emitting the first audio frame, then wait for STT response. if the user said something: session.interrupt() otherwise we continue speaking by playing the delayed frames.

Note: after reading the documentation it seems that the current implementation of interruptions is already working this way with: interruption based on VAD only and then we continue if it turns out nothing has been said.

If this is correct, maybe the feature would be interruption based on how many words the agent said : like if the agent said less than 5 words: we want any word coming from the user to interrupt the agent. In our case, once the agent started speaking (~more than 5 words), we don't want it to be interrupted anymore.

Also, during that time where we are trying to figure out if the interruption detected by the VAD is valid, we would like to play a sound to inform the user we were about to speak such as "mmh" However it seems there are no event like agent_maybe_interrupted (triggered by VAD only)

Workarounds / Alternatives

No response

Additional Context

No response

CyprienRicqueB2L avatar Nov 12 '25 14:11 CyprienRicqueB2L

Yes, VAD interruptions work like you described.

maybe the feature would be interruption based on how many words the agent said : like if the agent said less than 5 words: we want any word coming from the user to interrupt the agent. In our case, once the agent started speaking (~more than 5 words), we don't want it to be interrupted anymore.

An simpler version of this is you can update VAD/interruption options after x seconds of agent speaking:

  • start a timer when agent starts speaking, add a timer callback to disable interruption for current speech handle;
  • cancel the timer if the agent is interrupted;

Also, during that time where we are trying to figure out if the interruption detected by the VAD is valid, we would like to play a sound to inform the user we were about to speak such as "mmh"

It could be a good-to-have feature. Let's keep this issue open to see if anyone else wants this too.

chenghao-mou avatar Nov 14 '25 17:11 chenghao-mou

Is this the same issue as when User transitions speaking->listening, then after that agent listening -> thinking, user transitions listening -> speaking. But while user is speaking, the agent transitions from thinking -> speaking but immediately after 100ms after uttering a blip of audio "I" it goes back into listening. How do I prevent agent from speaking when the user is speaking already?

rathodsid avatar Nov 21 '25 10:11 rathodsid

But while user is speaking, the agent transitions from thinking -> speaking but immediately after 100ms after uttering a blip of audio "I" it goes back into listening.

It depends on how you are configuring the min_interruption_duration. If the user speaks 100ms before the agent speaks and you have min_interruption_duration = 0.2, then it will have to wait for another 100ms before we can say it is an interruption.

We also recently merged a PR #3995 that fixes some of the related issues. Feel free to try it when the next release is available.

chenghao-mou avatar Nov 21 '25 11:11 chenghao-mou