espnet icon indicating copy to clipboard operation
espnet copied to clipboard

Codec Major Updates

Open ftshijt opened this issue 9 months ago • 8 comments

What?

Codec Major updates

  • Update speechtokenizer (generalized as semantic_dac class)
  • Update fsq modeling (in fsq_dac)

Why?

  • Related to the survey paper experiments

ftshijt avatar Apr 15 '25 16:04 ftshijt

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 37.16%. Comparing base (3bff1f0) to head (f5e3813).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6093   +/-   ##
=======================================
  Coverage   37.16%   37.16%           
=======================================
  Files         580      580           
  Lines       53482    53482           
=======================================
  Hits        19875    19875           
  Misses      33607    33607           
Flag Coverage Δ
test_integration_espnetez 37.16% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 16 '25 08:04 codecov[bot]

Can you ask someone in the codec team to review this PR? I'll check the design and overview of the PR, but I want to delegate the technical details to someone.

sw005320 avatar Jun 09 '25 17:06 sw005320

This PR looks good to me, but fsq_dac and semantic_dac should be reviewed by someone. @wyh2000?

Both fsq_dac and semantic_dac looks good to me!

wyh2000 avatar Jun 23 '25 04:06 wyh2000

@ftshijt, please fix the CI error

sw005320 avatar Jun 23 '25 12:06 sw005320

@ftshijt, I’m not sure about the status of this PR, but I think this PR should be finished. Based on previous discussions, we almost finish the review, and just waiting for you to fix a CI issue.

sw005320 avatar Aug 15 '25 00:08 sw005320

This pull request adds a new AMUSE recipe for ESPnet-Codec, including comprehensive documentation and multiple training configuration files for the DAC codec model. The main changes introduce detailed instructions for using the AMUSE dataset, as well as several YAML configuration files for training the DAC codec at different sample rates and quantization settings.

Documentation and Usage:

  • Added a detailed README.md for the AMUSE ESPnet-Codec recipe, covering dataset sources, supported codecs, evaluation metrics, and usage instructions.

Training Configuration Files:

  • Added train_dac_large.yaml and train_dac_large_44k.yaml configuration files for training the DAC codec at standard and 44.1kHz sample rates, respectively, with multi-quantizer settings. [1] [2]
  • Added train_dac_large_single.yaml and train_dac_large_44k_single.yaml configuration files for training the DAC codec at 16kHz and 44.1kHz sample rates, respectively, with single-quantizer settings. [1] [2]

Fhrozen avatar Aug 15 '25 09:08 Fhrozen

/gemini review

Fhrozen avatar Aug 27 '25 13:08 Fhrozen

This PR is stale because it has been open for 90 days with no activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Dec 11 '25 02:12 github-actions[bot]