Hubert (latent variables) to VITS?
The VITS project contains posterior encoder which converts audio to latent space variables. But HuBERT does the same.
Does RVC work by generating latent space variables with HuBert and than use it for voice conversion in VITS?
Thank you for answer.
Yes, RVC is inspired by the well-known So-VITS-SVC, which in turn is inspired by Soft-VC (repository: https://github.com/bshall/soft-vc, paper: https://arxiv.org/abs/2111.02392). The author replaced the acoustic model and vocoder in Soft-VC with VITS.
Thank you!
Do you know about a code which could fine-tune VITS model? It seems that RVC trains entire model. We would like to experiment with just fine-tuning. Thank you.