Results for SemantiCodec
Here is the result for SemantiCodec This is a 16Khz codec with three different bit rates:
- For token rate 100 with book size 16384 the bit rate is 1.35 kbps
- For token rate 100 with book size 32768 the bit rate is 1.40 kbps
- For token rate 50 with book size 16384 the bit rate is 0.68 kbps
- For token rate 50 with book size 32768 the bit rate is 0.70 kbps
- For token rate 25 with book size 16384 the bit rate is 0.34 kbps
- For token rate 25 with book size 32768 the bit rate is 0.35 kbps
The inference code and checkpoint model can be found here
The results of the system under six different configurations are displayed as follow (one comment per system):
Results for model with 100 token rate and 16384 code book size:
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 71.39%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 3.81%
Stage 3: Run automatic speech recognition. WER: 5.55%
Stage 4: Run audio event classification. ACC: 83.60%
Log results File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -8.023059848347962
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.71579695
Stage 3: Run STOI. stoi: mean score is: 0.6374974081666491
Stage 4: Run PESQ. pesq: mean score is: 1.3225452315807342
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -16.204584799806007
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6063063
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -3.678850278531351
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8800615
Stage 3: Run STOI. stoi: mean score is: 0.8390240078687938
Stage 4: Run PESQ. pesq: mean score is: 2.0443784379959107
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -15.573628896021797
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.5704794
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -10.929932869636273
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3241482
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -10.0523148424559
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8997036
Stage 3: Run STOI. stoi: mean score is: 0.8000545556153663
Stage 4: Run PESQ. pesq: mean score is: 1.4450754988193513
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -7.4687414751106225
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8380325
Stage 3: Run STOI. stoi: mean score is: 0.8672585483834184
Stage 4: Run PESQ. pesq: mean score is: 2.0104604637622834
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -9.139100164017448
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.85657454
Stage 3: Run STOI. stoi: mean score is: 0.8004369960794232
Stage 4: Run PESQ. pesq: mean score is: 1.8498523151874542
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -6.784165470251713
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9755262
Stage 3: Run STOI. stoi: mean score is: 0.8754722747405146
Stage 4: Run PESQ. pesq: mean score is: 1.8099392879009246
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -9.873407105853522
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8101869
Stage 3: Run STOI. stoi: mean score is: 0.811518312954677
Stage 4: Run PESQ. pesq: mean score is: 1.786508893966675
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -13.585821389136129
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78308046
Stage 3: Run STOI. stoi: mean score is: 0.7916500742300961
Stage 4: Run PESQ. pesq: mean score is: 1.462774316072464
Average SDR for speech datasets: -8.575682571713081 Average Mel_Loss for speech datasets: 0.8448703312499999 Average STOI for speech datasets: 0.8028640222548673 Average PESQ for speech datasets: 1.7164418056607245 Average SDR for audio datasets: -14.236048855154692 Average Mel_Loss for audio datasets: 1.5003113
Results for model with 100 token rate and 32768 code book size:
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 71.04%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 3.64%
Stage 3: Run automatic speech recognition. WER: 5.50%
Stage 4: Run audio event classification. ACC: 83.15%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -8.288299352593407
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7141452
Stage 3: Run STOI. stoi: mean score is: 0.6402874449523498
Stage 4: Run PESQ. pesq: mean score is: 1.3165868592262269
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -16.00277567356359
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6065166
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -3.9123262783170674
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8796803
Stage 3: Run STOI. stoi: mean score is: 0.8415218683353153
Stage 4: Run PESQ. pesq: mean score is: 2.062159482240677
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -16.190419273403485
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.5684569
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -10.130163797288604
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3271292
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -9.806158885886454
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.89801973
Stage 3: Run STOI. stoi: mean score is: 0.8023610604658767
Stage 4: Run PESQ. pesq: mean score is: 1.4408800554275514
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -7.465939778921175
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83398134
Stage 3: Run STOI. stoi: mean score is: 0.8680992262187252
Stage 4: Run PESQ. pesq: mean score is: 2.0172382056713105
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -9.4248413812485
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8552469
Stage 3: Run STOI. stoi: mean score is: 0.8009020639528738
Stage 4: Run PESQ. pesq: mean score is: 1.874754753112793
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -6.770595884905695
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9717746
Stage 3: Run STOI. stoi: mean score is: 0.8772398321019043
Stage 4: Run PESQ. pesq: mean score is: 1.8369818699359894
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -9.957701026949303
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.80674434
Stage 3: Run STOI. stoi: mean score is: 0.8145108847377486
Stage 4: Run PESQ. pesq: mean score is: 1.8088320195674896
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -13.324050827908918
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78012276
Stage 3: Run STOI. stoi: mean score is: 0.7933999433518525
Stage 4: Run PESQ. pesq: mean score is: 1.4673356866836549
Average SDR for speech datasets: -8.618739177091316 Average Mel_Loss for speech datasets: 0.84246439625 Average STOI for speech datasets: 0.8047902905145807 Average PESQ for speech datasets: 1.7280961164832116 Average SDR for audio datasets: -14.107786248085226 Average Mel_Loss for audio datasets: 1.5007009
Results for model with 50 token rate and 16384 code book size:
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 68.12%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 6.16%
Stage 3: Run automatic speech recognition. WER: 9.55%
Stage 4: Run audio event classification. ACC: 76.55%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -8.83968419510651
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7127933
Stage 3: Run STOI. stoi: mean score is: 0.59937756747475
Stage 4: Run PESQ. pesq: mean score is: 1.2897077596187592
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -16.699295371807537
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.629877
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -4.29523195078702
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.96175253
Stage 3: Run STOI. stoi: mean score is: 0.8026151794296594
Stage 4: Run PESQ. pesq: mean score is: 1.801913343667984
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -16.5305423514448
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6025631
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -10.579743797921056
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3357253
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -10.66528635465503
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0411216
Stage 3: Run STOI. stoi: mean score is: 0.7410676812363071
Stage 4: Run PESQ. pesq: mean score is: 1.2746098387241362
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -8.113302633684958
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9311916
Stage 3: Run STOI. stoi: mean score is: 0.8395857457648703
Stage 4: Run PESQ. pesq: mean score is: 1.7791949903964996
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -9.662703719793258
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9265094
Stage 3: Run STOI. stoi: mean score is: 0.7611285319217221
Stage 4: Run PESQ. pesq: mean score is: 1.6827945744991302
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -7.3375089676368646
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0891488
Stage 3: Run STOI. stoi: mean score is: 0.8422847505824207
Stage 4: Run PESQ. pesq: mean score is: 1.572229918241501
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -10.419868769887758
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9091263
Stage 3: Run STOI. stoi: mean score is: 0.7786395411501819
Stage 4: Run PESQ. pesq: mean score is: 1.6212511384487152
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -14.210414162406268
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83399665
Stage 3: Run STOI. stoi: mean score is: 0.7465415983803311
Stage 4: Run PESQ. pesq: mean score is: 1.4144192659854888
Average SDR for speech datasets: -9.193000094244708 Average Mel_Loss for speech datasets: 0.9257050225000001 Average STOI for speech datasets: 0.7639050744925302 Average PESQ for speech datasets: 1.5545151036977767 Average SDR for audio datasets: -14.60319384039113 Average Mel_Loss for audio datasets: 1.5227218
Results for model with 50 token rate and 32768 code book size:
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 67.15%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 6.01%
Stage 3: Run automatic speech recognition. WER: 9.69%
Stage 4: Run audio event classification. ACC: 75.10%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -8.770160568168329
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7157427
Stage 3: Run STOI. stoi: mean score is: 0.5990633321199552
Stage 4: Run PESQ. pesq: mean score is: 1.2988323020935058
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -16.759995199904903
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6301608
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -4.439544938378307
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9583634
Stage 3: Run STOI. stoi: mean score is: 0.8065342834968997
Stage 4: Run PESQ. pesq: mean score is: 1.8093781626224519
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -16.590133601793124
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.600283
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -10.233150558590781
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3511304
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -10.776933275608268
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0376164
Stage 3: Run STOI. stoi: mean score is: 0.7419712845721602
Stage 4: Run PESQ. pesq: mean score is: 1.2745465958118438
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -7.896944603174362
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9255314
Stage 3: Run STOI. stoi: mean score is: 0.8416043077360352
Stage 4: Run PESQ. pesq: mean score is: 1.7907265722751617
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -9.604782428385983
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9269376
Stage 3: Run STOI. stoi: mean score is: 0.7635182742921145
Stage 4: Run PESQ. pesq: mean score is: 1.6788008534908294
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -7.306414974996127
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0834035
Stage 3: Run STOI. stoi: mean score is: 0.8455957873829031
Stage 4: Run PESQ. pesq: mean score is: 1.596457360982895
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -10.43514078996363
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.90818316
Stage 3: Run STOI. stoi: mean score is: 0.7796653041584611
Stage 4: Run PESQ. pesq: mean score is: 1.6307682001590729
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -14.158362698563757
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.83324564
Stage 3: Run STOI. stoi: mean score is: 0.7478858007055951
Stage 4: Run PESQ. pesq: mean score is: 1.415548061132431
Average SDR for speech datasets: -9.173535534654846 Average Mel_Loss for speech datasets: 0.923627975 Average STOI for speech datasets: 0.7657297968080157 Average PESQ for speech datasets: 1.5618822635710237 Average SDR for audio datasets: -14.527759786762935 Average Mel_Loss for audio datasets: 1.5271913999999998
Results for model with 25 token rate and 16384 code book size:
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 61.53%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 13.70%
Stage 3: Run automatic speech recognition. WER: 35.79%
Stage 4: Run audio event classification. ACC: 71.55%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -9.891073254225994
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.79265803
Stage 3: Run STOI. stoi: mean score is: 0.5382069630214918
Stage 4: Run PESQ. pesq: mean score is: 1.2317941224575042
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -17.354609349344106
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6950777
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -5.118099710803417
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1607062
Stage 3: Run STOI. stoi: mean score is: 0.7279729071609607
Stage 4: Run PESQ. pesq: mean score is: 1.470268008708954
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -17.525922260145695
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6639311
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -9.84819729776821
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4082423
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -11.828557564473659
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3157852
Stage 3: Run STOI. stoi: mean score is: 0.6398609276418542
Stage 4: Run PESQ. pesq: mean score is: 1.1277076315879822
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -9.074854594346156
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.143972
Stage 3: Run STOI. stoi: mean score is: 0.7747987615118724
Stage 4: Run PESQ. pesq: mean score is: 1.426479343175888
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -10.47760850527248
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1010196
Stage 3: Run STOI. stoi: mean score is: 0.6862266259635116
Stage 4: Run PESQ.
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -8.102805757598823
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3454256
Stage 3: Run STOI. stoi: mean score is: 0.783804268388349
Stage 4: Run PESQ. pesq: mean score is: 1.3105683100223542
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -11.169038464688196
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0954686
Stage 3: Run STOI. stoi: mean score is: 0.7094035546469811
Stage 4: Run PESQ. pesq: mean score is: 1.3510719525814057
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -15.680900866701082
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9287965
Stage 3: Run STOI. stoi: mean score is: 0.6631148883616201
Stage 4: Run PESQ. pesq: mean score is: 1.2879982483386994
Average SDR for speech datasets: -10.167867339763726 Average Mel_Loss for speech datasets: 1.1104789662499999 Average STOI for speech datasets: 0.69042361208708 Average PESQ for speech datasets: 1.3259033580124377 Average SDR for audio datasets: -14.909576302419337 Average Mel_Loss for audio datasets: 1.5890836999999998
Results for model with 25 token rate and 32768 code book size:
Codec SUPERB application evaluation
Stage 1: Run speech emotion recognition. Acc: 59.51%
Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs
Run speaker verification. EER: 13.39%
Stage 3: Run automatic speech recognition. WER: 34.24%
Stage 4: Run audio event classification. ACC: 70.45%
Log results
File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d
Stage 1: Run SDR evaluation. SDR: mean score is: -9.52817490628773
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78014153
Stage 3: Run STOI. stoi: mean score is: 0.536566776256902
Stage 4: Run PESQ.
File Name: esc50.log Codec SUPERB objective metric evaluation on esc50
Stage 1: Run SDR evaluation. SDR: mean score is: -18.045539644348803
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6942394
File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands
Stage 1: Run SDR evaluation. SDR: mean score is: -4.756434837791447
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1558565
Stage 3: Run STOI. stoi: mean score is: 0.7376097582470694
Stage 4: Run PESQ. pesq: mean score is: 1.4803874719142913
File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k
Stage 1: Run SDR evaluation. SDR: mean score is: -17.28732169023466
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.6601683
File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation
Stage 1: Run SDR evaluation. SDR: mean score is: -9.839931109752126
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4160614
File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test
Stage 1: Run SDR evaluation. SDR: mean score is: -11.686392159090719
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3149458
Stage 3: Run STOI. stoi: mean score is: 0.6450955925787938
Stage 4: Run PESQ. pesq: mean score is: 1.1226227939128877
File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech
Stage 1: Run SDR evaluation. SDR: mean score is: -9.023869144962699
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.1400143
Stage 3: Run STOI. stoi: mean score is: 0.778975415690721
Stage 4: Run PESQ. pesq: mean score is: 1.4233695840835572
File Name: quesst.log Codec SUPERB objective metric evaluation on quesst
Stage 1: Run SDR evaluation. SDR: mean score is: -10.446293708828193
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0919785
Stage 3: Run STOI. stoi: mean score is: 0.6912703894668684
Stage 4: Run PESQ. pesq: mean score is: 1.4184428441524506
File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset
Stage 1: Run SDR evaluation. SDR: mean score is: -7.820809908089303
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.343809
Stage 3: Run STOI. stoi: mean score is: 0.7835718970167425
Stage 4: Run PESQ. pesq: mean score is: 1.3171902728080749
File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1
Stage 1: Run SDR evaluation. SDR: mean score is: -11.3429282056549
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.0941978
Stage 3: Run STOI. stoi: mean score is: 0.71035581129116
Stage 4: Run PESQ. pesq: mean score is: 1.3429110085964202
File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10
Stage 1: Run SDR evaluation. SDR: mean score is: -15.616014513375687
Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9306869
Stage 3: Run STOI. stoi: mean score is: 0.6662986378594428
Stage 4: Run PESQ. pesq: mean score is: 1.2928564262390136
Average SDR for speech datasets: -10.027614673010085 Average Mel_Loss for speech datasets: 1.1064537912499999 Average STOI for speech datasets: 0.6937180348009626 Average PESQ for speech datasets: 1.3297000639140608 Average SDR for audio datasets: -15.057597481445194 Average Mel_Loss for audio datasets: 1.590156366666667