VoxCPM issues

Inquiry regarding loss functions, hyperparameters, and weights for Causal Audio VAE training

Hi VoxCPM Team, Thank you for open-sourcing VoxCPM and providing the technical report. I am very interested in the Causal Audio VAE architecture described in your work. To reproduce the...

echo-hmwang

Voice cloning in fine-tuning

1

Does the fine-tuning script also optimises the model for zeroshot voice cloning? Or do we need another script for that? Can you also share the pretraining code and configs please?

tensorjackal

LoRA训练中的一些问题

1

1. 默认配置中r=32, alpha=16, scaling=0.5，但是一版scaling会设置为2或者1,这里scaling默认为0.5请问是什么考量 2. 按照默认lora参数配置训练完之后，latest和step_2000测试效果存在gap，并且gap挺大的，请问这是为什么呀

BestMt111

Request:Addition of Hindi language

3

I want that we can clone english voice and genrate hindi audio. **PLS DEVS**

aghammadan

Tags for speech

3

Hey! Does the model supports , , etc tokens? If not, how to add them in the model?

tensorjackal

Support word and punctuation timestamps

2

There are versions of other open tts models that provide timestamps of words like kokoro. This can be a very useful feature for syncing visuals of the text with it's...

gad2103

ADD A NEW LANGUAGE

10

i've tested it and so far is it really good, you did a great job however this need to be always checked as if unchecked it struggle to understand word...

picolo100

音色问题

2

作者你好，如果不用音色克隆每次推理出来的音色都是不一样的，效果确实很好，有没有固定的音色模型。用参考音频进行音色克隆出现很多问题，1、对参考音频的音质很严格。不然会出现吞字和出现莫名奇妙的声音。2、参考音频不变，有时候推理出来的音频效果很好，再次推理可能效果就不行了，会出现语音变长的现象，开头会出现莫名奇怪的声音，有时候还会跟文本内容完全不一致。

Storyinsea

model = VoxCPM.from_pretrained("./pretrained_weights/") # Non-streaming wav = model.generate( text="VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech.", prompt_wav_path="/root/andy/overall_process/VoxCPM_TTS/example.wav", # optional: path to a prompt...

higherandy

CUDA error: operation not permitted

15

The following error occasionally occurs when using CUDA for inference. GPU：NVIDIA L20 ```text 2025-12-17 02:44:08,499 - __main__ - ERROR - [worker] Error while handling request: CUDA error: operation not permitted...

zengruizhao

VoxCPM
VoxCPM copied to clipboard

Metadata

Inquiry regarding loss functions, hyperparameters, and weights for Causal Audio VAE training

Voice cloning in fine-tuning

LoRA训练中的一些问题

Request:Addition of Hindi language

Tags for speech

Support word and punctuation timestamps

ADD A NEW LANGUAGE

音色问题

段错误

CUDA error: operation not permitted

← Metadata

Owner

Metadata

VoxCPM VoxCPM copied to clipboard

Metadata

← Metadata

Owner

Metadata

VoxCPM
VoxCPM copied to clipboard