DeepLearningExamples [FastPitch1.1/pytorch] "estimate_pitch" in the data

Related to FastPitch1.1/pytorch

Describe the bug

data_function.py: def estimate_pitch( ... ) is bad, it use librosa.pyin to estimate audio pitch, but without any custom paramters, the FastPitch project train.py allow user to specify parameters to compute mel and pitch , like sample-rate, hop-length, win-length, but they are not transfered into this function. as a result,in the training on-line feature computation, we will get mel-sequence and pitch-sequence with different length (estimate_pitch will always assume audio files are 22KHz, win_length=frame_length//2, hop_length=frame_leng//4).

To Reproduce Steps to reproduce the behavior:

Install '...'
Set "..."
Launch '...'

Expected behavior A clear and concise description of what you expected to happen.

Environment Please provide at least:

Container version (e.g. pytorch:19.05-py3):
GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB):
CUDA driver version (e.g. 418.67):

May 26 '22 03:05 JohnHerry

Did you find any workaround for this? I think I might be stuck due to this issue. #1200

Sep 13 '22 15:09 rishabhjain16

Did you find any workaround for this? I think I might be stuck due to this issue. #1200

I had changed the function to parameterize those about pitch extraction, it is simple.

Sep 19 '22 06:09 JohnHerry

Also, to mention another problem with that function, the fmax setting is way too high librosa.note_to_hz('C7') equals to 2093 Hz, and nobody speaks at that frequency. Probabilistic YIN takes MUCH longer to process when the hypothesis space is that large. Therefore, set this max value to something like 500/600 Hz instead, or even better, if you know the range of your speaker(s) set it to that. It will be much faster.

Feb 22 '23 09:02 martinvk1

[FastPitch1.1/pytorch] "estimate_pitch" in the data_function.py is a bad function