johnwick123f

Results 15 comments of johnwick123f

I think you just have to increase the threshold of the boxes and it should give you better performance. If that still doesnt work, try using the larger grounding dino...

I am writing this a few months later, but its easy to run the model if you use llama cpp and a quantized version of the model. You can even...

Hmm yeah I would like this feature too.

+1, it would be very useful for many things. Parler tts sounds very good and it would be great to support cloning voices

in console you have to run this command, i don't think you can use wget for installation(I might be mistaken) ``` apt-get install espeak-ng ```

Yes it can clone voices. What this repo means by tone color is that emotion or volume isn't really converted but rather the actual style. This codebase works by a...

@CyberTimon I believe vllm and transformers have support for ultravox but vllm is faster. The vllm script to run ultravox. https://github.com/vllm-project/vllm/blob/661a34fd4fdd700a29b2db758e23e4e243e7ff18/examples/offline_inference_audio_language.py#L23

Yep same exact thing, 500ms latency with very short text on a T4 gpu with api. Slower then realtime(4sec to gen 3sec of audio). Only on longer text, it is...