DeepLearningExamples [FastPitch] Why do you hierarchically predict the variance features (pitch and energy)?

Thank you always for sharing your thoughtful code.

As we can see in FastPitch code, you added the pitch embedding to encoder output before passing the energy predictor.

https://github.com/NVIDIA/DeepLearningExamples/blob/da7e1a701bd44885c5537afa7974be391f82401e/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py#L300

Why did you chose the hierarchical variance feature prediction instead of parallel prediction like the FastSpeech2(paper version)? Are there any performance advantages?

Oct 05 '23 08:10 changjinhan

Hello 😌. I hope you're well and that you are having a good day.

Sorry 😅 I don't know how it happened and sorry for that. I was trying to build my own model for my data for my local language and I faced issues. I don't know how I did what you said.

Can you please 🥺 tell me how I can use FastPitch to build my own model in Colab or another notebook?

I have issues with the base configuration: docker, NGC Container in Colab. How can I solve this?

Oct 06 '23 17:10 hervenzoghe

Hello 😌. I hope you're well and that you are having a good day.

Sorry 😅 I don't know how it happened and sorry for that. I was trying to build my own model for my data for my local language and I faced issues. I don't know how I did what you said.

Can you please 🥺 tell me how I can use FastPitch to build my own model in Colab or another notebook?

I have issues with the base configuration: docker, NGC Container in Colab. How can I solve this?

On Thu, 5 Oct 2023, 09:09 Changjin Han, @.***> wrote:

Thank you always for sharing your thoughtful code.

As we can see in FastPitch code, you added the pitch embedding to encoder output before passing the energy predictor.

https://github.com/NVIDIA/DeepLearningExamples/blob/da7e1a701bd44885c5537afa7974be391f82401e/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py#L300

Why did you chose the hierarchical variance feature prediction instead of parallel prediction like the FastSpeech2(paper version)? Are there any performance advantages?

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DeepLearningExamples/issues/1357, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCRSIUJPLISDOH6YHEQHKNDX5ZTMZAVCNFSM6AAAAAA5T2XLT2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZDONRVHE4TMMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Oct 06 '23 17:10 hervenzoghe