Puyuan Peng issues

Results 12 issues of


                                            Puyuan Peng

Having issues decoding HuBERT: code doesn't load finetune model

## 🐛 Bug I followed the documentation on HuBERT, and successfully finetune HuBERT with CTC loss. However, in the decoding stage, the code will still load pretrained model (by using...

bug

needs triage

[research question] does the model predict the last few codes in delayed pattern

Hi, Thanks for the great work and open sourcing everything! On the delayed pattern it seems that there should be a few model tokens at the end to be predicted...

How should I interpret training loss/metrics in encodec training

Hi, Thanks for making everything open sourced. I'm trying to train encodec model on my own data, but I'm not sure how to interpret the metrics, plus a lot of...

failed to build from source with CUDA architecture of 86

### Bug Description I tried to build from source by following the instructions in readme, at step `cmake .. -DCMAKE_BUILD_TYPE=Release -DFL_BUILD_ARRAYFIRE=ON -DCMAKE_CUDA_ARCHITECTURES=86`, it gives an error ``` CMake Error at...

bug

[BUG] Couldn't find alignment_analysis.csv in output folder or temp folder after mfa align

**Debugging checklist** [ ] Have you updated to latest MFA version? Yes [ ] Have you tried rerunning the command with the `--clean` flag? Yes **Describe the issue** Couldn't find...

bug

sampling rate issue

Great work! When running the DAC token extraction stage of the training script with the default hyperparams, I got warning: > It is strongly recommended to pass the `sampling_rate` argument...

Couldn't found the code in the docker image

Thanks for open sourcing the great work! After successfully pulling the docker image, and run `docker run --gpus all --rm -it richardbaihe/pytorch:a3t bash` I couldn't find the code of this...

what are the training datasets?

Thanks for making this available! What are the datasets that you used for modeling training (for the released checkpoints)?

请问如何理解 codes dimension

感谢开源精彩的工作！我想确认一下我对输出的 codes 的 ordering 的理解： VQVAE encode 函数的输出形状是 [B, T, 4]。假设 B=1， T=2，codes 是 [[a,b,c,d] [e,f,g,h]] 判断： a 是 T=1 的feature 的前一半第一次quantize 得到的code， b 是 T=1 的feature...

What dataset was used to train/eval the AAC model?

Great work! More info about the data would be appreciated!