Changing tts voice during the conversation
Hello, I want to change tts voice during the conversation (with a function call) but the VoiceAssistant does not have the ability to update the tts voice while running. Does anyone have an idea how to achieve that?
I think the starting point would be to close the AgentOutput on the VoiceAssistant and then create a new one as a result of the function call.
How to close the agent output? I have tried to close the assistant to create a new one. But i cannot close the old one.
There is an AgentOutput class that gets instantiated by the agent and manages the playback, I believe you'd have to fork the agent but if you close it and create a new one it looks like you can probably change the voice. At least that would likely be the best place to start from what I can see.
I did something very similar with a modified TTS class. I ask the LLM to include emotions and speed changes during its response generation and then update the tts using the before_tts_cb callback.
For example:
if isinstance(text, str):
text, new_opts = parse_options(text)
if new_opts:
agent._tts.update_opts(new_opts)
return text
For streaming output, I have to wait accumulate the text chunks until the an end marker is present:
async for chunk in text:
buffer += chunk
if buffer.count("</options>") == 1 and not parsed:
buffer, new_opts = parse_options(buffer, pronunciation)
if new_opts:
agent._tts.update_opts(new_opts)
we are going to make updates cleaner in the framework. Here's how I'm doing it for the voice assistant we built with Cartesia: https://gist.github.com/davidzhao/5738f0e2d434dea6e5224262ee5c3cfa
we are going to make updates cleaner in the framework. Here's how I'm doing it for the voice assistant we built with Cartesia: https://gist.github.com/davidzhao/5738f0e2d434dea6e5224262ee5c3cfa
Hello, do you know how to save the speech and TTS results to files while chatting with the agent?
Added update_options to most TTS in this PR