Adrian Lancucki comments

Results 11 comments of


                                            Adrian Lancucki

[TTS] Conditioning pitch, duration, decoder on speaker embedding in FastPitch

@subhankar-ghosh looks good! I wonder if extra conditioning improves the loss of predictors.

[Fast Pitch 1.1] Road Map and Augmentations?

Thanks! IMHO to do proper emotive synthesis there has to be an emotive dataset. In FastPitch, conditioning on speaker, which is implemented for multi-speaker models, could be overloaded to handle...

FastPitch: How to properly calculate pitch mean, std, fmin and fmax given the pitch estimated in shape of [1xmel_frames]?

Hi @yerzhan7orazayev , sorry for a late reply. For pitch mean std, just calculate those statistics over all pitch values in all audio files in the dataset. As for fmin...

[Help Requested] Getting static with a loss of 3

Hi @brentcty-2020 , sorry for a late reply. Have you manage to resolve your issue? After 100 epochs you should get something very intelligibile, just a little bit noisy ([sample](https://github.com/NVIDIA/DeepLearningExamples/files/7849935/001_the_overwhelming_majority_of_people_in.wav.gz))....

[FastPitch] Other language

Hi @Pawel-VRtechnology , That should be fairly straightforward. You're gonna need to adjust `--text-cleaners`, `--symbol-set` and either get ahold of a Polish pronunciation dictionary and supply it with `--cmudict-path`, or...

Can we fine tune a fine tuned model for quartznet architecture?

Hi @wiamfa there is a [pre-trained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_quartznet15x5) model for French from the [NeMo](https://github.com/NVIDIA/NeMo) project. You can use it in NeMo, or follow these steps to load it in DeepLearningExamples: 1. Change...

Adrian Lancucki

[TTS] Conditioning pitch, duration, decoder on speaker embedding in FastPitch

[Fast Pitch 1.1] Road Map and Augmentations?

FastPitch: How to properly calculate pitch mean, std, fmin and fmax given the pitch estimated in shape of [1xmel_frames]?

[Help Requested] Getting static with a loss of 3

[FastPitch] Other language

Can we fine tune a fine tuned model for quartznet architecture?

[JASPER/Pytorch] JASPER NVIDIA DALI Preprocessing Librispeech

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

[Fastpitch] Multi-speaker model changes output speaker identity for different texts

[Fastpitch] Multi-speaker model changes output speaker identity for different texts