ML-zoo icon indicating copy to clipboard operation
ML-zoo copied to clipboard

I can't reproduce the result for rnnoise Pesq 2.9

Open evansong0307 opened this issue 2 years ago • 12 comments

I run your github code with following command: python test.py --clean_wav_folder=clean_testset_wav/ --noisy_wav_folder=noisy_testset_wav/ --tflite_path=rnnoise_INT8.tflite

The document said Avg pesq is 2.9 but I on get 2.664 image I download test dataset with provided script file.

evansong0307 avatar Aug 21 '23 07:08 evansong0307

Another question is why the final_denoised_audio is append with previous denoise result? It caused pesq calculation with different shape and accumlate.

evansong0307 avatar Aug 21 '23 08:08 evansong0307

Hmm strange I have just re-ran the scripts and i get the following output:

pesq score for audio before de-noising: 1.9782992604288083 INFO: Created TensorFlow Lite XNNPACK delegate for CPU. pesq score for audio after de-noising with ../rnnoise_INT8.tflite is: 2.9194513495685985

One issue I did run into is the requirements.txt file no longer works correctly so I can post a new one that should work.

For your question on appending, RNNoise runs only on a small number of samples from the audio clip (480 if I remember correctly). You need to run the model many times for one audio file to get the full denoised audio out. Therefore we append after each inference until we have the complete denoised audio.

We then calculate the pesq score for that audio clip and add it to the running total before averaging at the end for all audio clips.

Burton2000 avatar Sep 20 '23 10:09 Burton2000

Using Python=3.7.16

The packages I tested it with working just now:

librosa==0.8.1 tensorflow==2.11.0 numpy==1.21.6 pesq==0.0.4 soundfile==0.12.1 h5py==3.8.0

Burton2000 avatar Sep 20 '23 10:09 Burton2000

thanks for your reply. I try it later. Maybe I confused is that why final_denoised_audio in the code doesn't clear for each wav file. I think each clean_wav has their own final_denoised_audio. ![image](https://github.com/ARM-software/ML-zoo/assets/23479581/e880079d-f66f-4d0d-ae7b-55bef4986a1b

evansong0307 avatar Sep 21 '23 11:09 evansong0307

as the picture show. I think it should clear the final_denoised_audio for each iter in loop after pesq for this step when is calculated.

evansong0307 avatar Sep 21 '23 11:09 evansong0307

Oh yes that is a good point I understand now! I will check this, re-run and let you know thanks.

Burton2000 avatar Sep 21 '23 12:09 Burton2000

I rerun the script again. from clean vritual environment based on provided env and new git clone. image I got the very high result. With command python test.py --clean_wav_folder=/dataset/rnnwave/clean_testset_wav/ --noisy_wav_folder=/dataset/rnnwave/noisy_testset_wav/ --tflite_path=rnnoise_INT8.tflite image image

evansong0307 avatar Sep 22 '23 01:09 evansong0307

I found that the order of input wav file caused the different result pesq in the end. As example I add sorted() in load wav function image It means load wave file in order based on file naming. And I rerun the code. It got different pesq result for each wav file

evansong0307 avatar Sep 22 '23 01:09 evansong0307

image image

I print each iter avgpesq for check

evansong0307 avatar Sep 22 '23 01:09 evansong0307

After looking yes that was a bug and we should have been clearing the list after each wav file has been processed, thank you for spotting this!

I have re-run and get an average pesq of 2.465 which I think is more in line with the results in the RNNoise paper now at least.

Burton2000 avatar Sep 22 '23 13:09 Burton2000

Interesting about the order affecting your overall result though, I wouldn't have thought that should make a difference as we take an average. I will add sorted() and run myself again to check.

One thing I have spotted is the states are not currently reset to 0's between wav files, which should be happening and could affect things maybe.

Burton2000 avatar Sep 22 '23 13:09 Burton2000

With the fix moving final_denoised_audio to the correct place and properly resetting states to 0 before each inference I now get a consistent pesq of 2.43. Order of loading WAVs doesn't matter now.

Burton2000 avatar Sep 25 '23 11:09 Burton2000