Jing-Yi Li comments

Results 34 comments of


                                            Jing-Yi Li

data_utils.py line 45, where are the spec.pt files?

those spec.pt files are generated on the fly during the first epoch of training.

Some observations after 690k steps

I tried some large-pitch-gap conversions, including the mentioned 5142-to-p326 conversion, and some other conversions like a high-pitch kid's voice to another low-pitch man's voice. I put the results [here](https://1drv.ms/u/s!AnvukVnlQ3ZT0BiqJf631WfOzBV7?e=Mailrh). So,...

Some observations after 690k steps

Hmmm, weired. #21 also reports worse results than mine. Tomorrow I'll check the results of my checkpoint trained with `data_utils.py` to see if this is the problem. I thought it...

Some observations after 690k steps

Tested checkpoint trained with `data_utils.py`, it does has this conversion failure problem. Sorry for the bug, I'll update the code later. Though, I can not figure out why this happens,...

Some observations after 690k steps

Tested again, it does not have conversion failure. In my first test I just passed `freevc-s.json` to load freevc. But, the resulted speech has some 'splitting up' voice. `data_utils.py` does...

Some observations after 690k steps

Yes. Btw code updated.

Some observations after 690k steps

1. that means you did not use batch size 64, 1 gpu, as I do. The displayed step can be influenced by batch size, number of gpu, etc. 2. The...

Some observations after 690k steps

> I noticed that `data_utils_old` includes a speaker ID along with the audio data. I don't see any usage of speaker ID in the version of `train` that got committed;...

Some observations after 690k steps

Also thanks to all of you for your interest ^_^

Some observations after 690k steps

It could be. The speaker encoder does not have special design for the case where there is a lot of silence. The speaker embedding of an utterance can be "averaged"...