agents icon indicating copy to clipboard operation
agents copied to clipboard

When `allow_interruption=False`, we should ignore user's input

Open davidzhao opened this issue 1 year ago • 2 comments

When the agent is speaking with allow_interruption=False, we should not be processing any user input, instead of queuing up another response (only to play it out later).

That response will lack the proper context and the user would not be expecting that the agent heard it.

davidzhao avatar Oct 03 '24 07:10 davidzhao

Related to this, with interruptions, the agent's state gets weird. If the agent is talking, and then the user speaks, the agent's state becomes listening rather than speaking.

To reproduce: subscribe to participant attributes changed, and have the agent say something with allow_interruptions=False, then say something while the agent is talking. You'll see the state becomes listening, even though the agent is still talking.

The other related thing - this seems to lead to very strange behavior with pre-emptive synthesis turned on. As far as I can tell, with allow_interruptions=False, when the agent is interrupted, it mostly thinks it stopped talking (though it keeps talking). So if you interrupt it, it starts queueing up subsequent responses - and then immediately sends 1-2 more responses after it finishes talking.

Also - I think if you play something with allow_interruptions=False (or maybe an "interrupt user=OK" flag?), I think it should cancel and ignore whatever the user is already speaking. e.g. the user is speaking, but we decide to make the agent speak, I don't want to process the user's answer.

martin-purplefish avatar Oct 03 '24 09:10 martin-purplefish

I agree with ignoring a user's input while interruptions=False. Also applies to min_interrupt settings

lhylton avatar Oct 04 '24 02:10 lhylton

@davidzhao We are having the same issue with outbound calls and the initial message (to greet the user). We don't want the message to be interrupted by a user (e.g. by saying "hello, it's maik speaking") and just want to discard any user input until the welcome message has been spoken out to the user. Is there a way to delay listening to the user with the current state of the SDK?

maik-parloa avatar Oct 15 '24 19:10 maik-parloa

@maik-parloa - one trick I use is in my before_llm_cb, if the agent is speaking, I return false.

martin-purplefish avatar Oct 15 '24 19:10 martin-purplefish

oh that's smart @martin-purplefish

lhylton avatar Oct 15 '24 20:10 lhylton

This issue has been fixed in [email protected] (PR)

theomonnom avatar Oct 15 '24 22:10 theomonnom