audio-diffusion icon indicating copy to clipboard operation
audio-diffusion copied to clipboard

Issues Training VAE

Open Adrian-Makcimus opened this issue 1 year ago • 0 comments

I am trying to train an autoencoder for Latent diffusion. I run: python scripts/train_vae.py --dataset_name data/physio22/mel_res_64 --batch_size 1 --gradient_accumulation_steps 1 --hop_length 1024 --max_epochs 5

And get the following error when converting to diffusers format I get this error: Epoch 0: 100%|█| 22696/22696 [30:06<00:00, 12.57it/s, loss=287, v_num=4.5e+6, aeloss_step=560.0, discloss_step=0.000, aeloss_epocTraceback (most recent call last): File "/home/anf2143/audioldm/open-cardio-diffusion/audiodiffusion/utils.py", line 355, in convert_ldm_to_hf_vae vae.load_state_dict(converted_vae_checkpoint) File "/home/anf2143/.conda/envs/vae_train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2581, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for AutoencoderKL: Missing key(s) in state_dict: "encoder.mid_block.attentions.0.to_q.weight", "encoder.mid_block.attentions.0.to_q.bias", "encoder.mid_block.attentions.0.to_k.weight", "encoder.mid_block.attentions.0.to_k.bias", "encoder.mid_block.attentions.0.to_v.weight", "encoder.mid_block.attentions.0.to_v.bias", "encoder.mid_block.attentions.0.to_out.0.weight", "encoder.mid_block.attentions.0.to_out.0.bias", "decoder.mid_block.attentions.0.to_q.weight", "decoder.mid_block.attentions.0.to_q.bias", "decoder.mid_block.attentions.0.to_k.weight", "decoder.mid_block.attentions.0.to_k.bias", "decoder.mid_block.attentions.0.to_v.weight", "decoder.mid_block.attentions.0.to_v.bias", "decoder.mid_block.attentions.0.to_out.0.weight", "decoder.mid_block.attentions.0.to_out.0.bias". Unexpected key(s) in state_dict: "encoder.mid_block.attentions.0.query.weight", "encoder.mid_block.attentions.0.query.bias", "encoder.mid_block.attentions.0.key.weight", "encoder.mid_block.attentions.0.key.bias", "encoder.mid_block.attentions.0.value.weight", "encoder.mid_block.attentions.0.value.bias", "encoder.mid_block.attentions.0.proj_attn.weight", "encoder.mid_block.attentions.0.proj_attn.bias", "decoder.mid_block.attentions.0.query.weight", "decoder.mid_block.attentions.0.query.bias", "decoder.mid_block.attentions.0.key.weight", "decoder.mid_block.attentions.0.key.bias", "decoder.mid_block.attentions.0.value.weight", "decoder.mid_block.attentions.0.value.bias", "decoder.mid_block.attentions.0.proj_attn.weight", "decoder.mid_block.attentions.0.proj_attn.bias".

Adrian-Makcimus avatar Mar 18 '25 19:03 Adrian-Makcimus