OnceJune comments

Results 22 comments of


                                            OnceJune

Stochastic duration prediction failed for fastspeech2

How's the synth result with fs2 duration predictor after the same steps of training? And also, in fs2 training, grad from duration predictor is passed to encoder, while in vits,...

Result getting worse when i use ground truth duration.

hi @AlexanderXuan ，I'm trying to use ground truth duration, but the added blank puzzles me. Should the blank be assigned with any duration? Or keep it zero?

Result getting worse when i use ground truth duration.

@AlexanderXuan Thank you, I will use zero for blank. What's the problem in your result? Pitch or mispronunciation?

Error using mel generated from glow-tts for hifi-gan training

You should either: 1. Drop the too short audio; 2. Pad the too short audio to segment length with zeros at the end.

Duration in filelist starts with -2?

And why bit depth is 9? https://github.com/rishikksh20/LightSpeech/blob/d9290f755f02d33d520c2304c5b6624f87864e55/configs/default.yaml#L30

[Bug] duration predictor safe guard

and for inference, seems exp and mel to linear is required before sending mel to griffinlim

How FastSpeech2 export onnx ?

@jinfagang How do you convert torch.linspace in variance predictor？I got error msg "Exporting the operator linspace to ONNX opset version 11 is not supported". My torch version is 1.7.0.

Should I train for each speaker by MB-MelGAN?

for tacotron, use speaker embedding to train multi-speaker model is ok, so I think mbmelgan can also be adjusted to take speaker embedding as input, but you might need to...

Benchmark RTF for all method: Melgan - MBMelgan - Hifigan - StyleGAN

Here's an RTF summary: https://github.com/xcmyz/FastVocoder#rtf

Breakpoint problem

I also meet this issue, but it does not appear in pretrain model audios, only appears after dis net is introduced