wizardk
wizardk
> I trained the multi speaker model on VCTK (~400k) and for longer input phrases (ie >5 words), performance is approximately comparable to the released pretrained model. > > For...
> @wizardk what gpus are you training on? Did you have to change batch size/lr to adapt to your hardware setup? Can you upload some example wav files to google...
You can try this: ``` import torch from torch.nn import functional as F x = torch.randn(1,2) print(x) y = x y = F.relu(y) print(y) ```
@seekerzz Could you share any synthesized samples?
@kpu I met the same error and wonder how much space needed to train big corpus like 100G?
或者减小模型的复杂度?
> Small update. StyleMelGAN (1.5M iter) is much better than HiFi-GAN (1.5M iter) as vocoder after FastSpeech2 for my dataset. FS2+StyleMelGAN almost the same quality as FS2+PWG, but SMG 3...
> SFSMN can be implemented using convolution layer, but vFSMN cannot. The operation is similar as convolution, but it is not the same. Conv ops do multiply and reduce sum...