yxlllc

Results 35 comments of yxlllc

RVC and So-Vits-Svc are similar end-to-end architectures. In fact, the spectrum is not explicitly generated during the conversion process, although HifiGAN is used (the input is a 192-dimensional hidden space...

> > RVC and So-Vits-Svc are similar end-to-end architectures. In fact, the spectrum is not explicitly generated during the conversion process, although HifiGAN is used (the input is a 192-dimensional...

I guess the reason is that the pre-trained model of rvc is trained with the VCTK dataset with a sampling rate of 48khz.

We have tried it, but bigvgan training is a bit difficult and the improvement is not obvious.

We actually have some experimental models, but the performance improvements have not met expectations. A key point may be that the performance of contentvec (hubert_base.pt) pre-training limits the final upper...

加载错模型了

Check whether the configuration file or pre-trained model used is correct

According to my tests, only MME is the most stable driver, the others are very random, and may be a problem with the sounddevice library.

If you use a pretrained model, a few hours will be enough