[FastPitch] Why do you hierarchically predict the variance features (pitch and energy)? #1357

changjinhan · 2023-10-05T08:09:15Z

Thank you always for sharing your thoughtful code.

As we can see in FastPitch code, you added the pitch embedding to encoder output before passing the energy predictor.

DeepLearningExamples/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py

Line 300 in da7e1a7

enc_out = enc_out + pitch_emb.transpose(1, 2)

Why did you chose the hierarchical variance feature prediction instead of parallel prediction like the FastSpeech2(paper version)?
Are there any performance advantages?

hervenzoghe · 2023-10-06T17:48:58Z

Hello 😌. I hope you're well and that you are having a good day.

Sorry 😅 I don't know how it happened and sorry for that. I was trying to build my own model for my data for my local language and I faced issues. I don't know how I did what you said.

Can you please 🥺 tell me how I can use FastPitch to build my own model in Colab or another notebook?

I have issues with the base configuration: docker, NGC Container in Colab. How can I solve this?

hervenzoghe · 2023-10-06T17:51:20Z

Hello 😌. I hope you're well and that you are having a good day. Sorry 😅 I don't know how it happened and sorry for that. I was trying to build my own model for my data for my local language and I faced issues. I don't know how I did what you said. Can you please 🥺 tell me how I can use FastPitch to build my own model in Colab or another notebook? I have issues with the base configuration: docker, NGC Container in Colab. How can I solve this?

…

On Thu, 5 Oct 2023, 09:09 Changjin Han, ***@***.***> wrote: Thank you always for sharing your thoughtful code. As we can see in FastPitch code, you added the pitch embedding to encoder output before passing the energy predictor. https://github.com/NVIDIA/DeepLearningExamples/blob/da7e1a701bd44885c5537afa7974be391f82401e/PyTorch/SpeechSynthesis/FastPitch/fastpitch/model.py#L300 Why did you chose the hierarchical variance feature prediction instead of parallel prediction like the FastSpeech2(paper version)? Are there any performance advantages? — Reply to this email directly, view it on GitHub <#1357>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BCRSIUJPLISDOH6YHEQHKNDX5ZTMZAVCNFSM6AAAAAA5T2XLT2VHI2DSMVQWIX3LMV43ASLTON2WKOZRHEZDONRVHE4TMMA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

changjinhan added the enhancement New feature or request label Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FastPitch] Why do you hierarchically predict the variance features (pitch and energy)? #1357

[FastPitch] Why do you hierarchically predict the variance features (pitch and energy)? #1357

changjinhan commented Oct 5, 2023

hervenzoghe commented Oct 6, 2023

hervenzoghe commented Oct 6, 2023 via email

[FastPitch] Why do you hierarchically predict the variance features (pitch and energy)? #1357

[FastPitch] Why do you hierarchically predict the variance features (pitch and energy)? #1357

Comments

changjinhan commented Oct 5, 2023

hervenzoghe commented Oct 6, 2023

hervenzoghe commented Oct 6, 2023 via email