YourTTS icon indicating copy to clipboard operation
YourTTS copied to clipboard

Fine Tune for Voice Conversion?

Open jsl303 opened this issue 3 years ago • 1 comments

I've tried the voice conversion by providing driving and target samples, but the result doesn't sound like the target at all. It's somewhat closer to driving sample. Is there an instruction on how to fine tune the model to make the output sound better?

jsl303 avatar Jul 11 '22 05:07 jsl303

same problem, the generated voice almost as same as the driving sample. And i found that the code just fine-tuning vocoder(hifigan )

kunyao2015 avatar Jul 22 '22 03:07 kunyao2015

The training procedure for voice conversion and TTS are equal. If you like you can follow the recipe that replicates the first experiment proposed in the YourTTS paper. The recipe replicates the single language training using the VCTK dataset (it downloads, resamples, and extracts the speaker embeddings automatically :)). However, if you are interested in multilingual training, we have commented on parameters on the VitsArgs class instance that should be enabled for multilingual training: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py

Edresson avatar Dec 12 '22 17:12 Edresson