VoxCPM icon indicating copy to clipboard operation
VoxCPM copied to clipboard

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Results 40 VoxCPM issues
Sort by recently updated
recently updated
newest added

Hi VoxCPM Team, Thank you for open-sourcing VoxCPM and providing the technical report. I am very interested in the Causal Audio VAE architecture described in your work. To reproduce the...

Does the fine-tuning script also optimises the model for zeroshot voice cloning? Or do we need another script for that? Can you also share the pretraining code and configs please?

1. 默认配置中r=32, alpha=16, scaling=0.5,但是一版scaling会设置为2或者1,这里scaling默认为0.5请问是什么考量 2. 按照默认lora参数配置训练完之后,latest和step_2000测试效果存在gap,并且gap挺大的,请问这是为什么呀

I want that we can clone english voice and genrate hindi audio. **PLS DEVS**

Hey! Does the model supports , , etc tokens? If not, how to add them in the model?

There are versions of other open tts models that provide timestamps of words like kokoro. This can be a very useful feature for syncing visuals of the text with it's...

i've tested it and so far is it really good, you did a great job however this need to be always checked as if unchecked it struggle to understand word...

作者你好,如果不用音色克隆每次推理出来的音色都是不一样的,效果确实很好,有没有固定的音色模型。用参考音频进行音色克隆出现很多问题,1、对参考音频的音质很严格。不然会出现吞字和出现莫名奇妙的声音。2、参考音频不变,有时候推理出来的音频效果很好,再次推理可能效果就不行了,会出现语音变长的现象,开头会出现莫名奇怪的声音,有时候还会跟文本内容完全不一致。

model = VoxCPM.from_pretrained("./pretrained_weights/") # Non-streaming wav = model.generate( text="VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech.", prompt_wav_path="/root/andy/overall_process/VoxCPM_TTS/example.wav", # optional: path to a prompt...

The following error occasionally occurs when using CUDA for inference. GPU:NVIDIA L20 ```text 2025-12-17 02:44:08,499 - __main__ - ERROR - [worker] Error while handling request: CUDA error: operation not permitted...