AcademiCodec issues

About the experimental settings in the paper "HIFI-CODEC: GROUP-RESIDUAL VECTOR QUANTIZATION FOR HIGH FIDELITY AUDIO CODEC"

1

I want to reproduce the results in this paper "HIFI-CODEC: GROUP-RESIDUAL VECTOR QUANTIZATION FOR HIGH FIDELITY AUDIO CODEC". However, the description is so confused. The paper just said that the...

Shengqiang-Li

why norm wav when inference？

1

why do norm in inference: wav = normalize(wav) * 0.95 i didn't see same operation in train dataset. Is this step necessary?

fengshi-cherish

What is the RTF of HiFiCodec decoder?

Hi, thanks for the job. I read the paper but I still didn't find information about the decoder speed of HiFiCodec. I mean how long would it takes on CPU...

JohnHerry

it does not converge for valle training

The model does not converge when I use hifi-codec to train NAR of valle. The data i used is a chinese dataset while its duration is 5000 hours. How can...

Mahaotian1

License missing

There is no LICENSE file. What is the license for this project and the pretrained models?

TechInterMezzo

请问有训练好的discriminator模型吗

我目前正在学习这个模型的训练流程，但是我看论文说训练到收敛需要8张卡训练一个多月，所以说对我而言短时间内肯定是训练不好的，我想知道有没有训练好的**discriminator**模型，谢谢。

UkiTenzai

自定义库distributed/launch.py的import问题

运行egs/SoundStream_24k_240d/main3_ddp.py时，当运行到第9行，也就是导入自定义库academicodec/models/encodec/distributed/launch.py时，launch.py会在第5行报错，说找不到库。这里只需要把launch.py第5行改写成`from . import distributed as dist_fn`就可以了。

UkiTenzai

将多卡/单卡的权重保存成同一种格式；支持shell脚本中data为多个字符串的输入；修改test文件

8

因为argparse的输入的input和output都是pathlib.path这个类，可以不需要引入os操作，结合官方的encodec的代码做出了以下修改

ZhikangNiu

可以例如下面这种格式保存，要不然单机保存的模型根据索引会出现问题，我会在后面提交修复的版本 ```python if epoch % config.common.save_interval == 0: model_to_save = model.module if config.distributed.data_parallel else model disc_model_to_save = disc_model.module if config.distributed.data_parallel else disc_model if not config.distributed.data_parallel or dist.get_rank() == 0: save_master_checkpoint(epoch,...

ZhikangNiu

取消 OMP_NUM_THREADS 的注释有可能加速 encodec 训练

解开 launch.py 里面关于 OMP_NUM_THREADS 的注释可以加速训练，也能提高 GPU 利用率，因为默认会使用所有核心（对于核心数很多的机器如 A100），多核心之间的交互可能有耗时，如果觉得 1 太小，可以额外在 train.sh 前面控制(如使用 8)，LibriTTS 的训练尚未测试 https://github.com/yangdongchao/AcademiCodec/blob/a496082fc2f7a324abb37fc3355487798dad2084/academicodec/models/encodec/distributed/launch.py#L34 also see https://github.com/yangdongchao/SoundStorm/pull/34 在该仓库中暂未验证

yt605155624

AcademiCodec
AcademiCodec copied to clipboard

Metadata

About the experimental settings in the paper "HIFI-CODEC: GROUP-RESIDUAL VECTOR QUANTIZATION FOR HIGH FIDELITY AUDIO CODEC"

why norm wav when inference？

What is the RTF of HiFiCodec decoder?

it does not converge for valle training

License missing

请问有训练好的discriminator模型吗

自定义库distributed/launch.py的import问题

将多卡/单卡的权重保存成同一种格式；支持shell脚本中data为多个字符串的输入；修改test文件

保存权重可以保存成统一的格式

取消 OMP_NUM_THREADS 的注释有可能加速 encodec 训练

← Metadata

Owner

Metadata

AcademiCodec AcademiCodec copied to clipboard

Metadata

← Metadata

Owner

Metadata

AcademiCodec
AcademiCodec copied to clipboard