lixuyuan102

Results 14 comments of lixuyuan102

> You will probably need to re-open this issue with libsndfile. Thanks

> Hi, we updated a PR to fix the problem. You can check it! (we use: from diffusers.optimization import get_cosine_schedule_with_warmup) Thanks for the reply. I'm using the NoamScheduler with a...

> Hi, we haven't test NoamScheduler, I think using AdamW with lr between 5e-5 to 1e-4 and cosine schedule with warmup steps between 5K to 1W steps will give a...

accelerate version >= 0.25 change the .bin to .safetensors. just replace "pytorch_model.bin" with "model.safetensors"

My AR model still can't accurately predict speech duration after 20 epochs/ 800K steps of training. Did you get reasonable results with your AR model?

> > accelerate version >= 0.25 change the .bin to .safetensors. > > just replace "pytorch_model.bin" with "model.safetensors" > > I replaced "pytorch_model.bin" with "model.safetensors", however it output such errors...

> > My AR model still can't accurately predict speech duration after 20 epochs/ 800K steps of training. Did you get reasonable results with your AR model? > > I...

Employing a different backbone network than the one (Transformer model with only convolutional positional coding) used in the voicebox paper to implement the ODE model, I have achieved a good...

Here is the loss curve: ![1721286015260](https://github.com/user-attachments/assets/ea753485-a382-481c-8bf5-c7d728a83782)

> The model released was trained for 670k steps, normally 400k would be sufficient for codec, according to descript-audio-codec's practice Thanks!