DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[FastPitch] Why do you hierarchically predict the variance features (pitch and energy)?

Open changjinhan opened this issue 2 years ago • 2 comments

Thank you always for sharing your thoughtful code.

As we can see in FastPitch code, you added the pitch embedding to encoder output before passing the energy predictor.

https://github.com/NVIDIA/DeepLearningExamples/blob/da7e1a701bd44885c5537afa7974be391f82401e/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py#L300

Why did you chose the hierarchical variance feature prediction instead of parallel prediction like the FastSpeech2(paper version)? Are there any performance advantages?

changjinhan avatar Oct 05 '23 08:10 changjinhan

Hello 😌. I hope you're well and that you are having a good day.

Sorry 😅 I don't know how it happened and sorry for that. I was trying to build my own model for my data for my local language and I faced issues. I don't know how I did what you said.

Can you please 🥺 tell me how I can use FastPitch to build my own model in Colab or another notebook?

I have issues with the base configuration: docker, NGC Container in Colab. How can I solve this?

hervenzoghe avatar Oct 06 '23 17:10 hervenzoghe

Hello 😌. I hope you're well and that you are having a good day.

Sorry 😅 I don't know how it happened and sorry for that. I was trying to build my own model for my data for my local language and I faced issues. I don't know how I did what you said.

Can you please 🥺 tell me how I can use FastPitch to build my own model in Colab or another notebook?

I have issues with the base configuration: docker, NGC Container in Colab. How can I solve this?

On Thu, 5 Oct 2023, 09:09 Changjin Han, @.***> wrote:

Thank you always for sharing your thoughtful code.

As we can see in FastPitch code, you added the pitch embedding to encoder output before passing the energy predictor.

https://github.com/NVIDIA/DeepLearningExamples/blob/da7e1a701bd44885c5537afa7974be391f82401e/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py#L300

Why did you chose the hierarchical variance feature prediction instead of parallel prediction like the FastSpeech2(paper version)? Are there any performance advantages?

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/DeepLearningExamples/issues/1357, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCRSIUJPLISDOH6YHEQHKNDX5ZTMZAVCNFSM6AAAAAA5T2XLT2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZDONRVHE4TMMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hervenzoghe avatar Oct 06 '23 17:10 hervenzoghe