Results for APCodec
16 kHz 2kbps
parameter size:
encoder (including quantizer) : 29MB decoder: 40MB
exps/results.txt
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 74.93%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 3.02%
Stage 3: Run automatic speech recognition. WER: 4.74%
Stage 4: Run audio event classification. ACC: 55.25%
src/codec_metrics/exps/results.txt
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -2.618520825954788
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.64869004
Stage 3: Run STOI. stoi: mean score is: 0.717766808809779
Stage 4: Run PESQ. pesq: mean score is: 1.5509950947761535
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -9.309950038168095
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 2.002597
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 2.68255129531442
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.87451327
Stage 3: Run STOI. stoi: mean score is: 0.8740794643709145
Stage 4: Run PESQ. pesq: mean score is: 2.1911674320697783
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -6.6539098549604345
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.8435475
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -3.0264018525811536
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4431057
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -1.3850498167169416
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.85650134
Stage 3: Run STOI. stoi: mean score is: 0.8534544293908012
Stage 4: Run PESQ. pesq: mean score is: 1.5768725705146789
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 2.5759249020219706
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8179612
Stage 3: Run STOI. stoi: mean score is: 0.8975456227011622
Stage 4: Run PESQ. pesq: mean score is: 2.2901515591144563
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -1.3464429268284184
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9656775
Stage 3: Run STOI. stoi: mean score is: 0.7968180258305204
Stage 4: Run PESQ. pesq: mean score is: 1.7317036986351013
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 4.364046016689939
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8910932
Stage 3: Run STOI. stoi: mean score is: 0.9133034388476792
Stage 4: Run PESQ. pesq: mean score is: 2.245469583272934
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 1.5015711204024194
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78175646
Stage 3: Run STOI. stoi: mean score is: 0.8577775240334691
Stage 4: Run PESQ. pesq: mean score is: 2.120602227449417
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -0.22438148479388495
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6584927
Stage 3: Run STOI. stoi: mean score is: 0.8339094697226409
Stage 4: Run PESQ. pesq: mean score is: 1.8127213382720948
Average SDR for speech datasets: 0.6937122850168396 Average Mel_Loss for speech datasets: 0.8118357137500001 Average STOI for speech datasets: 0.8430818479633708 Average PESQ for speech datasets: 1.9399604380130768 Average SDR for audio datasets: -6.330087248569893 Average Mel_Loss for audio datasets: 1.7630834000000002
Another experiment result and checkpoint will be released soon.
There was an error in the results at 2kbps; I just updated it. The model and checkpoints are available at here. Feel free to contact me anytime if anyone has any question. Next, here are the results from another set of experiments.
16 kHz 4kbps
parameter size:
encoder (including quantizer): 30MB decoder: 40MB
exps/results.txt
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.90%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 1.90%
Stage 3: Run automatic speech recognition. WER: 3.53%
Stage 4: Run audio event classification. ACC: 70.65%
src/codec_metrics/exps/results.txt
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 0.15002365111647298
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6124164
Stage 3: Run STOI. stoi: mean score is: 0.7736746973027623
Stage 4: Run PESQ. pesq: mean score is: 1.70997572183609
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -4.760846342641758
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.928971
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 5.220996340883257
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7699465
Stage 3: Run STOI. stoi: mean score is: 0.9142745866782225
Stage 4: Run PESQ. pesq: mean score is: 2.6232634449005126
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -2.8520096273567055
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.7751368
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 0.6088136447032942
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3763462
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 1.4808939116227589
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7496703
Stage 3: Run STOI. stoi: mean score is: 0.893277763358277
Stage 4: Run PESQ. pesq: mean score is: 1.8120478355884553
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 5.426437955617031
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7146479
Stage 3: Run STOI. stoi: mean score is: 0.926983531075999
Stage 4: Run PESQ. pesq: mean score is: 2.746657599210739
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 2.6550969225537635
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8277179
Stage 3: Run STOI. stoi: mean score is: 0.8536728749605849
Stage 4: Run PESQ. pesq: mean score is: 2.124973820447922
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 5.2696761366753035
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78780365
Stage 3: Run STOI. stoi: mean score is: 0.9406434092184658
Stage 4: Run PESQ. pesq: mean score is: 2.629430605173111
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 4.231224944768299
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.69843954
Stage 3: Run STOI. stoi: mean score is: 0.8950605511099898
Stage 4: Run PESQ. pesq: mean score is: 2.4963964211940763
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 3.239173312527914
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6139635
Stage 3: Run STOI. stoi: mean score is: 0.8769626591978656
Stage 4: Run PESQ. pesq: mean score is: 2.0776160097122194
Average SDR for speech datasets: 3.4591903969706 Average Mel_Loss for speech datasets: 0.72182571125 Average STOI for speech datasets: 0.8843187591127709 Average PESQ for speech datasets: 2.2775451822578905 Average SDR for audio datasets: -2.33468077509839 Average Mel_Loss for audio datasets: 1.6934846666666665