wizardk

Results 9 comments of wizardk

> I trained the multi speaker model on VCTK (~400k) and for longer input phrases (ie >5 words), performance is approximately comparable to the released pretrained model. > > For...

> @wizardk what gpus are you training on? Did you have to change batch size/lr to adapt to your hardware setup? Can you upload some example wav files to google...

You can try this: ``` import torch from torch.nn import functional as F x = torch.randn(1,2) print(x) y = x y = F.relu(y) print(y) ```

@kpu I met the same error and wonder how much space needed to train big corpus like 100G?

或者减小模型的复杂度?

> Small update. StyleMelGAN (1.5M iter) is much better than HiFi-GAN (1.5M iter) as vocoder after FastSpeech2 for my dataset. FS2+StyleMelGAN almost the same quality as FS2+PWG, but SMG 3...

> SFSMN can be implemented using convolution layer, but vFSMN cannot. The operation is similar as convolution, but it is not the same. Conv ops do multiply and reduce sum...