results
for the 16kHz Codec model: the bitrate is 2kbps; for the 44.1kHz Codec model: the bitrate is 6.89kbps; for the 48kHz Codec model: the bitrate is 7.5kbps;
#1、Here is the exps/results.txt Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.97%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 2.57%
Stage 3: Run automatic speech recognition. WER: 3.67%
Stage 4: Run audio event classification. ACC: 86.80%
#2、Here is the src/codec_metrics/exps/results.txt Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 12.264864005831004
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.46461612
Stage 3: Run STOI. stoi: mean score is: 0.9201546369667847
Stage 4: Run PESQ. pesq: mean score is: 2.9032970213890077
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: 6.726699210213638
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.89280885
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 8.476522537066758
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.75807977
Stage 3: Run STOI. stoi: mean score is: 0.9238519743607232
Stage 4: Run PESQ. pesq: mean score is: 2.8522612583637237
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: 6.95385805941422
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8306656
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 8.291245593533532
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.95218104
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 4.233350120341239
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7518116
Stage 3: Run STOI. stoi: mean score is: 0.9050623419177468
Stage 4: Run PESQ. pesq: mean score is: 2.0071350967884065
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 7.751003745240329
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.72347593
Stage 3: Run STOI. stoi: mean score is: 0.9340773701364049
Stage 4: Run PESQ. pesq: mean score is: 2.903846046924591
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 8.4340708735918
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8294336
Stage 3: Run STOI. stoi: mean score is: 0.8863192140533341
Stage 4: Run PESQ. pesq: mean score is: 2.6509935235977173
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 9.542545404819807
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7959907
Stage 3: Run STOI. stoi: mean score is: 0.9531058100873113
Stage 4: Run PESQ. pesq: mean score is: 2.7776152551174165
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 6.524681732109078
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.71494424
Stage 3: Run STOI. stoi: mean score is: 0.8977601804462474
Stage 4: Run PESQ. pesq: mean score is: 2.5823002088069917
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 13.074802660696786
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.49565125
Stage 3: Run STOI. stoi: mean score is: 0.9516724002511663
Stage 4: Run PESQ. pesq: mean score is: 2.9390562558174134
Average SDR for speech datasets: 8.7877301349621 Average Mel_Loss for speech datasets: 0.69175040125 Average STOI for speech datasets: 0.9215004910274648 Average PESQ for speech datasets: 2.7020630833506587 Average SDR for audio datasets: 7.323934287720463 Average Mel_Loss for audio datasets: 0.8918851633333333
Thanks for submitting the results. Could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know how to do inference using your model (we will leverage your model to test on the hidden set)?
Yes, I will finish it by Monday. However, I am currently encountering some issues with uploading the model to GitHub.
---Original--- From: @.> Date: Sat, Jun 15, 2024 13:18 PM To: @.>; Cc: @.@.>; Subject: Re: [voidful/Codec-SUPERB] results (Issue #37)
Thanks for submitting the results. Could you also refer to section 4.2 of the rule (https://codecsuperb.github.io/Codec-SUPERB-rule.pdf) to let us know how to do inference using your model (we will leverage your model to test on the hidden set)?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Here is one suggestion: the codec model ckpt can be uploaded to huggingface or google drive (with an instruction to use gdown to download the model)
Hello,I have completed the model release. The download link and usage instructions have been sent to you via email. If you have any questions, please feel free to contact me. Thank you very much.
---Original--- From: @.> Date: Sat, Jun 15, 2024 14:48 PM To: @.>; Cc: @.@.>; Subject: Re: [voidful/Codec-SUPERB] results (Issue #37)
Here is one suggestion: the codec model ckpt can be uploaded to huggingface or google drive (with an instruction to use gdown to download the model)
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
updata:
16khz,2kbps codec model
(1) Downstream results:
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.97%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 2.57%
Stage 3: Run automatic speech recognition. WER: 3.64%
Stage 4: Run audio event classification. ACC: 71.10%
(2) Signal-level results
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 4.641087071226074
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.580518
Stage 3: Run STOI. stoi: mean score is: 0.7878352309918871
Stage 4: Run PESQ. pesq: mean score is: 1.7021552300453187
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is:
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.8372705
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 8.476178635258437
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.75820357
Stage 3: Run STOI. stoi: mean score is: 0.923865876017417
Stage 4: Run PESQ. pesq: mean score is: 2.852576096057892
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: 0.4370140327990164
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.756783
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 1.1408946927353245
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4059703
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 4.2329371204943
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7518392
Stage 3: Run STOI. stoi: mean score is: 0.9050986518783571
Stage 4: Run PESQ. pesq: mean score is: 2.006877576112747
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 7.752400420683839
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.72355676
Stage 3: Run STOI. stoi: mean score is: 0.9340837095549291
Stage 4: Run PESQ. pesq: mean score is: 2.9040276074409483
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 8.433560426646096
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8292966
Stage 3: Run STOI. stoi: mean score is: 0.8863539521867545
Stage 4: Run PESQ. pesq: mean score is: 2.650856384038925
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 9.542030656936957
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7960729
Stage 3: Run STOI. stoi: mean score is: 0.9530965262477374
Stage 4: Run PESQ. pesq: mean score is: 2.7770466423034668
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 6.525108717315516
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7149558
Stage 3: Run STOI. stoi: mean score is: 0.8977717650359602
Stage 4: Run PESQ. pesq: mean score is: 2.582567346096039
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 6.794794006624397
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.58160305
Stage 3: Run STOI. stoi: mean score is: 0.8815944944789442
Stage 4: Run PESQ. pesq: mean score is: 1.9832410085201264
Average SDR for speech datasets: 7.049762131898202 Average Mel_Loss for speech datasets: 0.7170057350000001 Average STOI for speech datasets: 0.8962125257989983 Average PESQ for speech datasets: 2.432418486326933 Average SDR for audio datasets: 0.7889543627671705 Average Mel_Loss for audio datasets: 1.58137665
44.1khz,7kbps codec model
(1) Downstream results
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.49%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 1.53%
Stage 3: Run automatic speech recognition. WER: 3.19%
Stage 4: Run audio event classification. ACC: 86.55%
(2) Signal-level results
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 11.9198190006243
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.46257296
Stage 3: Run STOI. stoi: mean score is: 0.9120819982808108
Stage 4: Run PESQ. pesq: mean score is: 2.830995168685913
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: 6.241463241745936
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83121604
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 13.733762141747928
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.62197036
Stage 3: Run STOI. stoi: mean score is: 0.9634806161341553
Stage 4: Run PESQ. pesq: mean score is: 3.8307976722717285
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: 6.650526363401869
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7842263
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 8.656592212439394
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9138063
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 10.175153523690033
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6136928
Stage 3: Run STOI. stoi: mean score is: 0.9652810154755299
Stage 4: Run PESQ. pesq: mean score is: 3.5824116134643553
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 12.392481496173902
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.61285776
Stage 3: Run STOI. stoi: mean score is: 0.9659764076205769
Stage 4: Run PESQ. pesq: mean score is: 3.854781861305237
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 14.22380206490447
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5592373
Stage 3: Run STOI. stoi: mean score is: 0.9433491989857918
Stage 4: Run PESQ. pesq: mean score is: 3.7363220167160036
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 13.537287795228872
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.66826826
Stage 3: Run STOI. stoi: mean score is: 0.9756619198675819
Stage 4: Run PESQ. pesq: mean score is: 3.6874674439430235
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 11.822250275546512
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.59757036
Stage 3: Run STOI. stoi: mean score is: 0.955504206240366
Stage 4: Run PESQ. pesq: mean score is: 3.8139785027503965
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 12.644553808875898
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.50202703
Stage 3: Run STOI. stoi: mean score is: 0.9474133160648855
Stage 4: Run PESQ. pesq: mean score is: 2.8804212963581084
Average SDR for speech datasets: 12.55613876334899 Average Mel_Loss for speech datasets: 0.57977460375 Average STOI for speech datasets: 0.9535935848337121 Average PESQ for speech datasets: 3.527146946936846 Average SDR for audio datasets: 7.182860605862399 Average Mel_Loss for audio datasets: 0.84308288
48kHz,7.5kbps codec model
(1) Downstream results
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 75.28%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 1.49%
Stage 3: Run automatic speech recognition. WER: 3.07%
Stage 4: Run audio event classification. ACC: 88.00%
(2)Signal-level results
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: 12.2636534888401
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.4645789
Stage 3: Run STOI. stoi: mean score is: 0.9201668776856671
Stage 4: Run PESQ. pesq: mean score is: 2.900687514543533
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: 6.726355181016816
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.892827
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: 14.124681010234116
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5964956
Stage 3: Run STOI. stoi: mean score is: 0.9658302396521976
Stage 4: Run PESQ. pesq: mean score is: 3.873115861415863
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: 6.954926362898063
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83061826
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: 8.296033518758794
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9518249
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: 10.664635680971012
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5921542
Stage 3: Run STOI. stoi: mean score is: 0.9683333864449756
Stage 4: Run PESQ. pesq: mean score is: 3.6724947714805602
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: 12.879912781761652
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5930597
Stage 3: Run STOI. stoi: mean score is: 0.9687304311248394
Stage 4: Run PESQ. pesq: mean score is: 3.869354705810547
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: 14.652514660452471
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.54244286
Stage 3: Run STOI. stoi: mean score is: 0.9472981762704458
Stage 4: Run PESQ. pesq: mean score is: 3.7385361623764037
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: 13.91570370530584
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6493998
Stage 3: Run STOI. stoi: mean score is: 0.97752452279595
Stage 4: Run PESQ. pesq: mean score is: 3.7307146120071413
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: 12.273928539620078
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.5752002
Stage 3: Run STOI. stoi: mean score is: 0.9589951570618435
Stage 4: Run PESQ. pesq: mean score is: 3.8899018454551695
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: 13.074327995615095
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.4956447
Stage 3: Run STOI. stoi: mean score is: 0.9516514417002608
Stage 4: Run PESQ. pesq: mean score is: 2.938644474744797
Average SDR for speech datasets: 12.981169732850043 Average Mel_Loss for speech datasets: 0.563621995 Average STOI for speech datasets: 0.9573162790920224 Average PESQ for speech datasets: 3.5766812434792516 Average SDR for audio datasets: 7.32577168755789 Average Mel_Loss for audio datasets: 0.8917567200000001