[FastPitch1.1/pytorch] "estimate_pitch" in the data_function.py is a bad function
Related to FastPitch1.1/pytorch
Describe the bug
data_function.py: def estimate_pitch( ... ) is bad, it use librosa.pyin to estimate audio pitch, but without any custom paramters, the FastPitch project train.py allow user to specify parameters to compute mel and pitch , like sample-rate, hop-length, win-length, but they are not transfered into this function. as a result,in the training on-line feature computation, we will get mel-sequence and pitch-sequence with different length (estimate_pitch will always assume audio files are 22KHz, win_length=frame_length//2, hop_length=frame_leng//4).
To Reproduce Steps to reproduce the behavior:
- Install '...'
- Set "..."
- Launch '...'
Expected behavior A clear and concise description of what you expected to happen.
Environment Please provide at least:
- Container version (e.g. pytorch:19.05-py3):
- GPUs in the system: (e.g. 8x Tesla V100-SXM2-16GB):
- CUDA driver version (e.g. 418.67):
Did you find any workaround for this? I think I might be stuck due to this issue. #1200
Did you find any workaround for this? I think I might be stuck due to this issue. #1200
I had changed the function to parameterize those about pitch extraction, it is simple.
Also, to mention another problem with that function, the fmax setting is way too high librosa.note_to_hz('C7') equals to
2093 Hz, and nobody speaks at that frequency. Probabilistic YIN takes MUCH longer to process when the hypothesis space is that large. Therefore, set this max value to something like 500/600 Hz instead, or even better, if you know the range of your speaker(s) set it to that. It will be much faster.