johnwick123f comments

Results 15 comments of


                                            johnwick123f

The model performed poorly

I think you just have to increase the threshold of the boxes and it should give you better performance. If that still doesnt work, try using the larger grounding dino...

Anyone able to run 7B on google colab?

I am writing this a few months later, but its easy to run the model if you use llama cpp and a quantized version of the model. You can even...

Zero-Shot Voice Cloning

+1, it would be very useful for many things. Parler tts sounds very good and it would be great to support cloning voices

Espeak not installed when loading models in Gradio_app

in console you have to run this command, i don't think you can use wget for installation(I might be mistaken) ``` apt-get install espeak-ng ```

Can this repo clone the original voice and generate a voice file with the speaker voice ?

Yes it can clone voices. What this repo means by tone color is that emotion or volume isn't really converted but rather the actual style. This codebase works by a...

Colab doesn't start - Encodec.cpp seems to lack CMakeLists.txt

Yep same exact error!

Colab doesn't start - Encodec.cpp seems to lack CMakeLists.txt

Yeah that solved it, thanks!

How can I run the realtime model locally on my linux machine?

@CyberTimon I believe vllm and transformers have support for ultravox but vllm is faster. The vllm script to run ultravox. https://github.com/vllm-project/vllm/blob/661a34fd4fdd700a29b2db758e23e4e243e7ff18/examples/offline_inference_audio_language.py#L23

(fish-speech v1.5) bigger real time factor on short texts

Yep same exact thing, 500ms latency with very short text on a T4 gpu with api. Slower then realtime(4sec to gen 3sec of audio). Only on longer text, it is...