Jing-Yi Li
Jing-Yi Li
those spec.pt files are generated on the fly during the first epoch of training.
I tried some large-pitch-gap conversions, including the mentioned 5142-to-p326 conversion, and some other conversions like a high-pitch kid's voice to another low-pitch man's voice. I put the results [here](https://1drv.ms/u/s!AnvukVnlQ3ZT0BiqJf631WfOzBV7?e=Mailrh). So,...
Hmmm, weired. #21 also reports worse results than mine. Tomorrow I'll check the results of my checkpoint trained with `data_utils.py` to see if this is the problem. I thought it...
Tested checkpoint trained with `data_utils.py`, it does has this conversion failure problem. Sorry for the bug, I'll update the code later. Though, I can not figure out why this happens,...
Tested again, it does not have conversion failure. In my first test I just passed `freevc-s.json` to load freevc. But, the resulted speech has some 'splitting up' voice. `data_utils.py` does...
Yes. Btw code updated.
1. that means you did not use batch size 64, 1 gpu, as I do. The displayed step can be influenced by batch size, number of gpu, etc. 2. The...
> I noticed that `data_utils_old` includes a speaker ID along with the audio data. I don't see any usage of speaker ID in the version of `train` that got committed;...
Also thanks to all of you for your interest ^_^
It could be. The speaker encoder does not have special design for the case where there is a lot of silence. The speaker embedding of an utterance can be "averaged"...