alisakgg comments

Results 7 comments of


                                            alisakgg

Pretrained Hifi GAN vocoder at 16KHz

"upsample_rates": [2,5,4,4], "upsample_kernel_sizes": [16,15,4,4], "upsample_initial_channel": 512, "resblock_kernel_sizes": [3,7,11], "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]], "resblock_initial_channel": 256, "segment_size": 5120, "num_mels": 80, "num_freq": 512, "n_fft": 512, "hop_size": 160, "win_size": 512, "sampling_rate": 16000, i use...

who can tell me what is case in train1...

![image](https://user-images.githubusercontent.com/37888350/53148691-8284c400-35e6-11e9-9d4a-da4843ba2bfa.png) my code with train1.py stunk here, anyone konw how to solve it?

How to start training from last epoch where model got saved ?

give the command python train2 -ckpt which-model

the performance of new voice(fintune) is bad

A mandarin multi-speaker dataset was used for pretraining. Another Chinese speaker was used for finetuning.

the performance of new voice(fintune) is bad

I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm?

the performance of new voice(fintune) is bad

> Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage? yes. i use the default config "num_speaker: 955". There are 30...

the performance of new voice(fintune) is bad

> > > > You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0. ok, i...