Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Hubert (latent variables) to VITS?

Open Lukysoon opened this issue 1 year ago • 3 comments

The VITS project contains posterior encoder which converts audio to latent space variables. But HuBERT does the same.

Does RVC work by generating latent space variables with HuBert and than use it for voice conversion in VITS?

Thank you for answer.

Lukysoon avatar Aug 28 '24 18:08 Lukysoon

Yes, RVC is inspired by the well-known So-VITS-SVC, which in turn is inspired by Soft-VC (repository: https://github.com/bshall/soft-vc, paper: https://arxiv.org/abs/2111.02392). The author replaced the acoustic model and vocoder in Soft-VC with VITS.

Nian-Ci avatar Oct 03 '24 17:10 Nian-Ci

Thank you!

Lukysoon avatar Oct 03 '24 18:10 Lukysoon

Do you know about a code which could fine-tune VITS model? It seems that RVC trains entire model. We would like to experiment with just fine-tuning. Thank you.

Lukysoon avatar Oct 03 '24 18:10 Lukysoon