alisakgg
alisakgg
"upsample_rates": [2,5,4,4], "upsample_kernel_sizes": [16,15,4,4], "upsample_initial_channel": 512, "resblock_kernel_sizes": [3,7,11], "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]], "resblock_initial_channel": 256, "segment_size": 5120, "num_mels": 80, "num_freq": 512, "n_fft": 512, "hop_size": 160, "win_size": 512, "sampling_rate": 16000, i use...
 my code with train1.py stunk here, anyone konw how to solve it?
give the command python train2 -ckpt which-model
A mandarin multi-speaker dataset was used for pretraining. Another Chinese speaker was used for finetuning.
I mentioned that only the decoder and speaker embeddings have gradients during finetune. If the decoder weights should have no grad except the condition layer norm?
> Do you set num_speaker in model config equal to number of speakers in mandarin dataset in pretrain stage? yes. i use the default config "num_speaker: 955". There are 30...
> > > > You have to change default config "num_speaker" equal to 30 (in your case) in pretrain stage. When finetune, just set your speaker_id = 0. ok, i...