Retrieval-based-Voice-Conversion-WebUI Hubert (latent variables) to VITS?

The VITS project contains posterior encoder which converts audio to latent space variables. But HuBERT does the same.

Does RVC work by generating latent space variables with HuBert and than use it for voice conversion in VITS?

Thank you for answer.

Aug 28 '24 18:08 Lukysoon

Yes, RVC is inspired by the well-known So-VITS-SVC, which in turn is inspired by Soft-VC (repository: https://github.com/bshall/soft-vc, paper: https://arxiv.org/abs/2111.02392). The author replaced the acoustic model and vocoder in Soft-VC with VITS.

Oct 03 '24 17:10 Nian-Ci

Thank you!

Oct 03 '24 18:10 Lukysoon

Do you know about a code which could fine-tune VITS model? It seems that RVC trains entire model. We would like to experiment with just fine-tuning. Thank you.

Oct 03 '24 18:10 Lukysoon