Kokoro-TTS should be default speech model
The current TTS implementation is not very good and is very robotic sounding. Kokoro TTS has voice options, is much better at around the base model size, and for a much higher speech quality and speed than all the other models.
Kokoro has been tested already and is in the roadmap now.
That's good, will keep a lookout for future updates.
Chatterbox > Kokoro :)
Chatterbox > Kokoro :)
Better in quality yes, but not speed. Even in quality not by much. For the size of the model Kokoro is lighting fast, almost instantaneous. It's output will be done by the time the LLM even responds, and you need that kind of speed for agents. Quality will be good enough for the speed Kokoro will give.
I would opt more for a open compatible url endpoints for the tts and stt so people have multiple options what they wanna run.
I have the Kokoro integration finished -- just needs a bit of testing.
The current TTS implementation is not very good and is very robotic sounding. Kokoro TTS has voice options, is much better at around the base model size, and for a much higher speech quality and speed than all the other models.
I've been messing around with A0 past couple days with the new kokoro integration. It's good, but a couple of things are missing. There doesn't seem to be the ability to select the different voices that Kokoro has on offer. The model has about 30+ pre-trained voices over different languages and accents to choose from. You should be able to choose which one you want, aside from just this default one. Also, another great add is if you could also assign different voices to different agents, so that you can add more to their personality of sorts. Would be something that can be added as a quality of life improvement in the next update.