lixuyuan102 comments

Results 14 comments of


                                            lixuyuan102

soundfile.info does not work on mp3

> You will probably need to re-open this issue with libsndfile. Thanks

CosineAnnealingLR

> Hi, we updated a PR to fix the problem. You can check it! (we use: from diffusers.optimization import get_cosine_schedule_with_warmup) Thanks for the reply. I'm using the NoamScheduler with a...

CosineAnnealingLR

> Hi, we haven't test NoamScheduler, I think using AdamW with lr between 5e-5 to 1e-4 and cosine schedule with warmup steps between 5K to 1W steps will give a...

VELL-E (model_train_stage 2) error output "No such file pytorch_model.bin"

accelerate version >= 0.25 change the .bin to .safetensors. just replace "pytorch_model.bin" with "model.safetensors"

VELL-E (model_train_stage 2) error output "No such file pytorch_model.bin"

My AR model still can't accurately predict speech duration after 20 epochs/ 800K steps of training. Did you get reasonable results with your AR model?

VELL-E (model_train_stage 2) error output "No such file pytorch_model.bin"

> > accelerate version >= 0.25 change the .bin to .safetensors. > > just replace "pytorch_model.bin" with "model.safetensors" > > I replaced "pytorch_model.bin" with "model.safetensors", however it output such errors...

VELL-E (model_train_stage 2) error output "No such file pytorch_model.bin"

> > My AR model still can't accurately predict speech duration after 20 epochs/ 800K steps of training. Did you get reasonable results with your AR model? > > I...

Mel model

Employing a different backbone network than the one (Transformer model with only convolutional positional coding) used in the voicebox paper to implement the ODE model, I have achieved a good...

How many steps would be enough if i train this model from start?

Here is the loss curve: ![1721286015260](https://github.com/user-attachments/assets/ea753485-a382-481c-8bf5-c7d728a83782)

How many steps would be enough if i train this model from start?

> The model released was trained for 670k steps, normally 400k would be sufficient for codec, according to descript-audio-codec's practice Thanks!