YourTTS Fine Tune for Voice Conversion?

I've tried the voice conversion by providing driving and target samples, but the result doesn't sound like the target at all. It's somewhat closer to driving sample. Is there an instruction on how to fine tune the model to make the output sound better?

Jul 11 '22 05:07 jsl303

same problem， the generated voice almost as same as the driving sample. And i found that the code just fine-tuning vocoder(hifigan )

Jul 22 '22 03:07 kunyao2015

The training procedure for voice conversion and TTS are equal. If you like you can follow the recipe that replicates the first experiment proposed in the YourTTS paper. The recipe replicates the single language training using the VCTK dataset (it downloads, resamples, and extracts the speaker embeddings automatically :)). However, if you are interested in multilingual training, we have commented on parameters on the VitsArgs class instance that should be enabled for multilingual training: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py

Dec 12 '22 17:12 Edresson