inconnu11
inconnu11
Hi, I synthesized converted speeches of this three models, VAE, CDVAE and CDVAE-CLS-GAN separately. The results of CDVAE-CLS-GAN model sound worst. Is it supposed to be like this? Or anything...
If the length of content code, rhythm code and pitch code is different from each other, how do they align since there is no attention mechanism in decoder?
Hi, I observed that the range of spectrogram saved in npy file is -0.2 ~ 0.8. I am wondering why you normalize spectrogram into this range? For what reason?
Hi, Zhang Could you please explain how the text encoder output and recognition encoder output align? it is stated in your paper as "The recognition encoder Er is a seq2seq...
Hi, in your code, you did not pre emphasis the wav before extracting the mel spectrogram?
Hi, why do you update learning rate before optimizer.step() in [code](https://github.com/ming024/FastSpeech2/blob/master/model/optimizer.py#L22-L24)? Should not we conduct optimizer.step() fist and then conduct scheduler.step()? Or is there some other consideration?
Hi, is the pitch/energy normalized within corpus instead of within speaker? Would it be better within speaker?
Hi, in the content encoder, you use the average pooling 1d to down sample the content representation. The content representation is down sampled by factor of 8 compared with the...
I am using the low-rank adaptation to train my keras model. The base model is frozen and only lora linear layers are trained. I want to save the trainable lora-related...
Hi, I am adding your MDN prosody modeling code segment to my tacotron but I encountered several problems about the code segment about prosody modeling. First, the prosody loss is...