agents icon indicating copy to clipboard operation
agents copied to clipboard

Changing tts voice during the conversation

Open olivvein opened this issue 1 year ago • 5 comments

Hello, I want to change tts voice during the conversation (with a function call) but the VoiceAssistant does not have the ability to update the tts voice while running. Does anyone have an idea how to achieve that?

olivvein avatar Sep 28 '24 18:09 olivvein

I think the starting point would be to close the AgentOutput on the VoiceAssistant and then create a new one as a result of the function call.

jezell avatar Oct 01 '24 09:10 jezell

How to close the agent output? I have tried to close the assistant to create a new one. But i cannot close the old one.

olivvein avatar Oct 01 '24 13:10 olivvein

There is an AgentOutput class that gets instantiated by the agent and manages the playback, I believe you'd have to fork the agent but if you close it and create a new one it looks like you can probably change the voice. At least that would likely be the best place to start from what I can see.

jezell avatar Oct 02 '24 15:10 jezell

I did something very similar with a modified TTS class. I ask the LLM to include emotions and speed changes during its response generation and then update the tts using the before_tts_cb callback.

For example:

if isinstance(text, str):
    text, new_opts = parse_options(text)
    if new_opts:
        agent._tts.update_opts(new_opts)
    return text

For streaming output, I have to wait accumulate the text chunks until the an end marker is present:

async for chunk in text:
	  buffer += chunk
	  if buffer.count("</options>") == 1 and not parsed:
	      buffer, new_opts = parse_options(buffer, pronunciation)
	      if new_opts:
	          agent._tts.update_opts(new_opts)

ChenghaoMou avatar Oct 11 '24 15:10 ChenghaoMou

we are going to make updates cleaner in the framework. Here's how I'm doing it for the voice assistant we built with Cartesia: https://gist.github.com/davidzhao/5738f0e2d434dea6e5224262ee5c3cfa

davidzhao avatar Oct 11 '24 16:10 davidzhao

we are going to make updates cleaner in the framework. Here's how I'm doing it for the voice assistant we built with Cartesia: https://gist.github.com/davidzhao/5738f0e2d434dea6e5224262ee5c3cfa

Hello, do you know how to save the speech and TTS results to files while chatting with the agent?

BaiMoHan avatar Oct 15 '24 08:10 BaiMoHan

Added update_options to most TTS in this PR

theomonnom avatar Oct 15 '24 22:10 theomonnom