OnceJune
OnceJune
How's the synth result with fs2 duration predictor after the same steps of training? And also, in fs2 training, grad from duration predictor is passed to encoder, while in vits,...
hi @AlexanderXuan ,I'm trying to use ground truth duration, but the added blank puzzles me. Should the blank be assigned with any duration? Or keep it zero?
@AlexanderXuan Thank you, I will use zero for blank. What's the problem in your result? Pitch or mispronunciation?
You should either: 1. Drop the too short audio; 2. Pad the too short audio to segment length with zeros at the end.
And why bit depth is 9? https://github.com/rishikksh20/LightSpeech/blob/d9290f755f02d33d520c2304c5b6624f87864e55/configs/default.yaml#L30
and for inference, seems exp and mel to linear is required before sending mel to griffinlim
@jinfagang How do you convert torch.linspace in variance predictor?I got error msg "Exporting the operator linspace to ONNX opset version 11 is not supported". My torch version is 1.7.0.
for tacotron, use speaker embedding to train multi-speaker model is ok, so I think mbmelgan can also be adjusted to take speaker embedding as input, but you might need to...
Here's an RTF summary: https://github.com/xcmyz/FastVocoder#rtf
I also meet this issue, but it does not appear in pretrain model audios, only appears after dis net is introduced