Hyungchan Yoon (윤형찬)
Hyungchan Yoon (윤형찬)
Following the official code of HiFi-GAN, they inference audio by audio = audio * MAX_WAV_VALUE audio = audio.cpu().numpy().astype('int16') write(output_file, h.sampling_rate, audio) In my case, it works when I changed the...
Revise the code as below in preprocessors/libritts.py BEFORE if len([f for f in f0 if f != 0]) ==0 or len([e for e in energy if e != 0]): AFTER...
This repo has built on Pure VITS & StyleSpeech, in order to verify the SC-CNN technique. Do you mean SCL loss in YourTTS?
Loss terms related to meta-learning are excluded in this repo.
did you use the custom dataset?
That's weird I'll check it.. I guess it might be due to the recent commit adding einsum
Plz try to disable the fp16 trianing option, I found the same problem when training other flow-based models.
Sorry, not yet. I'll try it and evaluate when there is GPU to spare.
Thanks for sharing. I have one question after looking at the code, is the generation performance good even detaching the posterior output? I thought it would cause overfitting in the...
If my understanding is correct, z_spec is soley trained by reconstruction loss (hifigan). Is it correct? So I thought detaching z_spec can be viewed as training autoencoder without restriction (such...