Codec Major Updates
What?
Codec Major updates
- Update speechtokenizer (generalized as semantic_dac class)
- Update fsq modeling (in fsq_dac)
Why?
- Related to the survey paper experiments
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 37.16%. Comparing base (
3bff1f0) to head (f5e3813).
Additional details and impacted files
@@ Coverage Diff @@
## master #6093 +/- ##
=======================================
Coverage 37.16% 37.16%
=======================================
Files 580 580
Lines 53482 53482
=======================================
Hits 19875 19875
Misses 33607 33607
| Flag | Coverage Δ | |
|---|---|---|
| test_integration_espnetez | 37.16% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Can you ask someone in the codec team to review this PR? I'll check the design and overview of the PR, but I want to delegate the technical details to someone.
This PR looks good to me, but fsq_dac and semantic_dac should be reviewed by someone. @wyh2000?
Both fsq_dac and semantic_dac looks good to me!
@ftshijt, please fix the CI error
@ftshijt, I’m not sure about the status of this PR, but I think this PR should be finished. Based on previous discussions, we almost finish the review, and just waiting for you to fix a CI issue.
This pull request adds a new AMUSE recipe for ESPnet-Codec, including comprehensive documentation and multiple training configuration files for the DAC codec model. The main changes introduce detailed instructions for using the AMUSE dataset, as well as several YAML configuration files for training the DAC codec at different sample rates and quantization settings.
Documentation and Usage:
- Added a detailed
README.mdfor the AMUSE ESPnet-Codec recipe, covering dataset sources, supported codecs, evaluation metrics, and usage instructions.
Training Configuration Files:
- Added
train_dac_large.yamlandtrain_dac_large_44k.yamlconfiguration files for training the DAC codec at standard and 44.1kHz sample rates, respectively, with multi-quantizer settings. [1] [2] - Added
train_dac_large_single.yamlandtrain_dac_large_44k_single.yamlconfiguration files for training the DAC codec at 16kHz and 44.1kHz sample rates, respectively, with single-quantizer settings. [1] [2]
/gemini review
This PR is stale because it has been open for 90 days with no activity. It will be closed if no further activity occurs. Thank you for your contributions.