agents icon indicating copy to clipboard operation
agents copied to clipboard

Agent speech output audio is interpreted as user speech

Open andrewjhogue opened this issue 1 year ago • 4 comments

When using LiveKit agents, sometimes the agent hears its own TTS output (eg via the laptop speakers) which is then interpreted as speech from the user.

This then creates a feedback loop where the agent will then translate + respond a second time to its own speech output.

This only seems to happen when device volume is above ~25-30% and audio is being played through the device speakers.

To provide a seamless UX though, the user shouldn't have to worry about managing volume level in order to prevent this.

My current approach is:

  1. When instantiating a LiveKit room, enabling audioSuppression and echoCancellation, eg:
    <LiveKitRoom
        token={createAudioCoachingCallRequest.result.room_access_token}
        serverUrl={createAudioCoachingCallRequest.result.active_server_websocket_url}
        audio={{echoCancellation: true, noiseSuppression: true}}
        connect={true}
    >
  1. Enabling allowInterruptions=True in agent.py, eg:
    assistant = VoiceAssistant(
        ...,
        allow_interruptions=True,
    )
  1. Muting the user's mic on user_speech_committed + agent_started_speaking events, then unmuting on agent_speech_committed event (eg, after the Agent finishes speaking).

Muting the user's mic is a short-term workaround -- the main limitation being that the user can't interrupt the agent once it starts speaking.

Are there best practices for preventing this feedback loop / is this something LiveKit is working on addressing?

andrewjhogue avatar May 22 '24 10:05 andrewjhogue

Haven't had this issue on chrome + macbook. WebRTC echo cancellation is typically pretty good. What browser/device are you testing on?

keepingitneil avatar May 23 '24 21:05 keepingitneil

Am running this on the latest Chrome x macbook (Ventura 13).

Haven't had it happen much on our live site yet - seems to be sporadic + happening locally, usually near the beginning of a session.

andrewjhogue avatar May 26 '24 20:05 andrewjhogue

We are also seeing this while using the iOS SDK with speakerphone voice input. We also have echoCancellation and noiseSuppression on. It's happening pretty frequently. Muting the user during agent speech is not an option. How can we approach a fix?

egoldschmidt avatar Aug 05 '24 05:08 egoldschmidt

My workaround was: I build a button to this issue on the frontend end sends a stop_speaking, so a user can cancel if the agent speaks too much. Example file how I solved the issue. Don't know if this helps you 100% here. (look at my attached agent.py file) agent.txt

ChrisFeldmeier avatar Sep 08 '24 20:09 ChrisFeldmeier

I'm experiencing the same problem on android with the flutter sdk

SeyedAlirezaFatemi avatar Dec 05 '24 07:12 SeyedAlirezaFatemi

Same issue happening on Android (developed with React Native and Expo). Doesn't seem to happen on iOS. Anyone at Livekit please look into this as it makes voice calls essentially unusable at the moment. I'll update anyone here if I find a work around that doesn't involve the user muting themselves.

cyrus-estavillo avatar Dec 12 '24 02:12 cyrus-estavillo

Has anybody tried Krisp.ai. I don't want to invest into $500/month price before somebody can verify that this helps

CodingFu avatar Dec 16 '24 22:12 CodingFu

It was working fine till I visually add audio transcription in my front-end, not sure if there is any correlation but is definitely more common for the agent to "speak to itself" now.

I added these 2 params (allow_interruptions, add_to_chat_ctx) in my agent, since in my case the agent usually get lost in the first sentence. This is basically ignoring the self talk to avoid downstream feedback loop: Screenshot 2024-12-28 at 23 59 58

App example: Screenshot 2024-12-29 at 00 01 45

joao-osilva avatar Dec 29 '24 02:12 joao-osilva