16 kHz 2kbps

parameter size:

encoder (including quantizer) : 29MB decoder: 40MB

exps/results.txt

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 74.93%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 3.02%

Stage 3: Run automatic speech recognition. WER: 4.74%

Stage 4: Run audio event classification. ACC: 55.25%

src/codec_metrics/exps/results.txt

Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: -2.618520825954788

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.64869004

Stage 3: Run STOI. stoi: mean score is: 0.717766808809779

Stage 4: Run PESQ. pesq: mean score is: 1.5509950947761535

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -9.309950038168095

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 2.002597

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: 2.68255129531442

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.87451327

Stage 3: Run STOI. stoi: mean score is: 0.8740794643709145

Stage 4: Run PESQ. pesq: mean score is: 2.1911674320697783

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -6.6539098549604345

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.8435475

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: -3.0264018525811536

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.4431057

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: -1.3850498167169416

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.85650134

Stage 3: Run STOI. stoi: mean score is: 0.8534544293908012

Stage 4: Run PESQ. pesq: mean score is: 1.5768725705146789

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: 2.5759249020219706

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8179612

Stage 3: Run STOI. stoi: mean score is: 0.8975456227011622

Stage 4: Run PESQ. pesq: mean score is: 2.2901515591144563

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: -1.3464429268284184

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.9656775

Stage 3: Run STOI. stoi: mean score is: 0.7968180258305204

Stage 4: Run PESQ. pesq: mean score is: 1.7317036986351013

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: 4.364046016689939

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8910932

Stage 3: Run STOI. stoi: mean score is: 0.9133034388476792

Stage 4: Run PESQ. pesq: mean score is: 2.245469583272934

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: 1.5015711204024194

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78175646

Stage 3: Run STOI. stoi: mean score is: 0.8577775240334691

Stage 4: Run PESQ. pesq: mean score is: 2.120602227449417

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: -0.22438148479388495

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6584927

Stage 3: Run STOI. stoi: mean score is: 0.8339094697226409

Stage 4: Run PESQ. pesq: mean score is: 1.8127213382720948

Average SDR for speech datasets: 0.6937122850168396 Average Mel_Loss for speech datasets: 0.8118357137500001 Average STOI for speech datasets: 0.8430818479633708 Average PESQ for speech datasets: 1.9399604380130768 Average SDR for audio datasets: -6.330087248569893 Average Mel_Loss for audio datasets: 1.7630834000000002

Jun 19 '24 15:06 redmist328

Another experiment result and checkpoint will be released soon.

Jun 19 '24 15:06 redmist328

There was an error in the results at 2kbps; I just updated it. The model and checkpoints are available at here. Feel free to contact me anytime if anyone has any question. Next, here are the results from another set of experiments.

16 kHz 4kbps

parameter size:

encoder (including quantizer): 30MB decoder: 40MB

exps/results.txt

Codec SUPERB application evaluation

Stage 1: Run speech emotion recognition. Acc: 75.90%

Stage 2: Run speaker related evaluation. Parsing the resyn_trial.txt for resyn wavs

Run speaker verification. EER: 1.90%

Stage 3: Run automatic speech recognition. WER: 3.53%

Stage 4: Run audio event classification. ACC: 70.65%

src/codec_metrics/exps/results.txt

Log results

File Name: crema_d.log Codec SUPERB objective metric evaluation on crema_d

Stage 1: Run SDR evaluation. SDR: mean score is: 0.15002365111647298

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6124164

Stage 3: Run STOI. stoi: mean score is: 0.7736746973027623

Stage 4: Run PESQ. pesq: mean score is: 1.70997572183609

File Name: esc50.log Codec SUPERB objective metric evaluation on esc50

Stage 1: Run SDR evaluation. SDR: mean score is: -4.760846342641758

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.928971

File Name: fluent_speech_commands.log Codec SUPERB objective metric evaluation on fluent_speech_commands

Stage 1: Run SDR evaluation. SDR: mean score is: 5.220996340883257

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7699465

Stage 3: Run STOI. stoi: mean score is: 0.9142745866782225

Stage 4: Run PESQ. pesq: mean score is: 2.6232634449005126

File Name: fsd50k.log Codec SUPERB objective metric evaluation on fsd50k

Stage 1: Run SDR evaluation. SDR: mean score is: -2.8520096273567055

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.7751368

File Name: gunshot_triangulation.log Codec SUPERB objective metric evaluation on gunshot_triangulation

Stage 1: Run SDR evaluation. SDR: mean score is: 0.6088136447032942

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 1.3763462

File Name: libri2Mix_test.log Codec SUPERB objective metric evaluation on libri2Mix_test

Stage 1: Run SDR evaluation. SDR: mean score is: 1.4808939116227589

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7496703

Stage 3: Run STOI. stoi: mean score is: 0.893277763358277

Stage 4: Run PESQ. pesq: mean score is: 1.8120478355884553

File Name: librispeech.log Codec SUPERB objective metric evaluation on librispeech

Stage 1: Run SDR evaluation. SDR: mean score is: 5.426437955617031

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.7146479

Stage 3: Run STOI. stoi: mean score is: 0.926983531075999

Stage 4: Run PESQ. pesq: mean score is: 2.746657599210739

File Name: quesst.log Codec SUPERB objective metric evaluation on quesst

Stage 1: Run SDR evaluation. SDR: mean score is: 2.6550969225537635

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.8277179

Stage 3: Run STOI. stoi: mean score is: 0.8536728749605849

Stage 4: Run PESQ. pesq: mean score is: 2.124973820447922

File Name: snips_test_valid_subset.log Codec SUPERB objective metric evaluation on snips_test_valid_subset

Stage 1: Run SDR evaluation. SDR: mean score is: 5.2696761366753035

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.78780365

Stage 3: Run STOI. stoi: mean score is: 0.9406434092184658

Stage 4: Run PESQ. pesq: mean score is: 2.629430605173111

File Name: voxceleb1.log Codec SUPERB objective metric evaluation on voxceleb1

Stage 1: Run SDR evaluation. SDR: mean score is: 4.231224944768299

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.69843954

Stage 3: Run STOI. stoi: mean score is: 0.8950605511099898

Stage 4: Run PESQ. pesq: mean score is: 2.4963964211940763

File Name: vox_lingua_top10.log Codec SUPERB objective metric evaluation on vox_lingua_top10

Stage 1: Run SDR evaluation. SDR: mean score is: 3.239173312527914

Stage 2: Run Mel Spectrogram Loss. mel_loss: mean score is: 0.6139635

Stage 3: Run STOI. stoi: mean score is: 0.8769626591978656

Stage 4: Run PESQ. pesq: mean score is: 2.0776160097122194

Average SDR for speech datasets: 3.4591903969706 Average Mel_Loss for speech datasets: 0.72182571125 Average STOI for speech datasets: 0.8843187591127709 Average PESQ for speech datasets: 2.2775451822578905 Average SDR for audio datasets: -2.33468077509839 Average Mel_Loss for audio datasets: 1.6934846666666665

Jun 19 '24 17:06 redmist328

Results for APCodec

16 kHz 2kbps

parameter size:

exps/results.txt

src/codec_metrics/exps/results.txt

16 kHz 4kbps

parameter size:

exps/results.txt

src/codec_metrics/exps/results.txt