FreeVC
FreeVC copied to clipboard
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
The dependencies for torch and torchvision were incompatible. Torch 1.10.0 utilizes torchvision 0.11.1 instead of 0.9.0.
Hi, I really liked this great project and while testing it with some English male as source and Malayalam male as target sample the output sounds like a female voice,...
At first I was somewhat impressed, using a male voice as source and a female voice as target that was 30 seconds long and noise-cleaned by AI. It pretty much...
For unseen F to seen M conversion, the resulting pitch is very close to the source speaker , especially if the source pitch is much higher than seen M pitch....
I am confused why the speaker embedding `g` is used to condition multiple model components (_Posterior Encoder, Decoder, Flow_) as opposed to just _Flow_. From the model diagram in **Fig....
Are there any tips to consider when training a model with 44.1k data? Additionally, does increasing the sampling rate of training data contribute to improved model performance? Now my training...
Hi, I'm running trainings with and w/o using the pretained checkpoint (VCTK) as initial state. However, in both cases the target pitch is affected by the input pitch (e.g. from...
When I do: # inference with FreeVC `CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc.json --ptfile checkpoints/freevc.pth --txtpath convert.txt --outdir outputs/freevc` How do I get the freevc.json and freevc.pth checkpoint if I did...
Hello! I'm delighted to come across this remarkable project, and thanks for sharing it as an open-source project. Currently, my focus lies on fine-tuning the freevc-s model using pretrained checkpoints...
我在测试说话人相似度的时候发现训练集和在LibriTTS的train-clean-100上测得的平均相似度很接近,是因为提供的pt文件是已经在LibriTTS上已经fine-tune好的吗?还是我测试说话人相似度的方法不太合适?我用的是该项目自带的pretrained speaker encoder提的emb vector计算转换后的语音和参考音频之间的余弦相似度。