johnwick123f
johnwick123f
I think you just have to increase the threshold of the boxes and it should give you better performance. If that still doesnt work, try using the larger grounding dino...
I am writing this a few months later, but its easy to run the model if you use llama cpp and a quantized version of the model. You can even...
Hmm yeah I would like this feature too.
+1, it would be very useful for many things. Parler tts sounds very good and it would be great to support cloning voices
in console you have to run this command, i don't think you can use wget for installation(I might be mistaken) ``` apt-get install espeak-ng ```
Yes it can clone voices. What this repo means by tone color is that emotion or volume isn't really converted but rather the actual style. This codebase works by a...
Yep same exact error!
Yeah that solved it, thanks!
@CyberTimon I believe vllm and transformers have support for ultravox but vllm is faster. The vllm script to run ultravox. https://github.com/vllm-project/vllm/blob/661a34fd4fdd700a29b2db758e23e4e243e7ff18/examples/offline_inference_audio_language.py#L23
Yep same exact thing, 500ms latency with very short text on a T4 gpu with api. Slower then realtime(4sec to gen 3sec of audio). Only on longer text, it is...