RuntimeError: 'weight' must be 2-D
Describe the bug
When I run the example of text_to_image.py, I got the problem shown in logs. I'm pretty sure I have it configured and running as the reademe.md requires.
Reproduction
https://github.com/huggingface/diffusers/tree/main/examples/text_to_image/train_text_to_image.py
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export dataset_name="lambdalabs/pokemon-blip-captions"
accelerate launch train_text_to_image.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$dataset_name
--use_ema
--resolution=512 --center_crop --random_flip
--train_batch_size=1
--gradient_accumulation_steps=4
--gradient_checkpointing
--mixed_precision="fp16"
--max_train_steps=15000
--learning_rate=1e-05
--max_grad_norm=1
--lr_scheduler="constant" --lr_warmup_steps=0
--output_dir="sd-pokemon-model"
Logs
Traceback (most recent call last):
File "train_text_to_image.py", line 630, in <module>
main()
File "train_text_to_image.py", line 569, in main
print(text_encoder(batch["input_ids"]))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/clip/modeling_clip.py", line 733, in forward
return self.text_model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/clip/modeling_clip.py", line 636, in forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/clip/modeling_clip.py", line 165, in forward
inputs_embeds = self.token_embedding(input_ids)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: 'weight' must be 2-D
System Info
diffusers=0.5.1 torch=1.12.0+cu113 accelerate=0.13.2
cc @patil-suraj can you take a look here?
Thanks a lot for reporting, will take try this and let you know.
I just tried your command and it works fine for me, couldn't reproduce it. Could you maybe try again and let us no if the issue persists.
I also created a new container to build the environment from scratch to download the code and install dependencies, but I still encountered this error in the end: RuntimeError: 'weight' must be 2-D
accelerate config
In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0
Which type of machine are you using? ([0] No distributed training, [1] multi-CPU, [2] multi-GPU, [3] TPU [4] MPS): 2
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1
Do you want to use DeepSpeed? [yes/NO]: yes
Do you want to specify a json file to a DeepSpeed config? [yes/NO]: no
What should be your DeepSpeed's ZeRO optimization stage (0, 1, 2, 3)? [2]: 3
Where to offload optimizer states? [none/cpu/nvme]: cpu
Where to offload parameters? [none/cpu/nvme]: cpu
How many gradient accumulation steps you're passing in your script? [1]: 4
Do you want to use gradient clipping? [yes/NO]: yes
What is the gradient clipping value? [1.0]: 1
Do you want to save 16-bit model weights when using ZeRO Stage-3? [yes/NO]: yes
Do you want to enable deepspeed.zero.Init when using ZeRO Stage-3 for constructing massive models? [yes/NO]: ye4s
Please enter yes or no.
Do you want to enable deepspeed.zero.Init when using ZeRO Stage-3 for constructing massive models? [yes/NO]: yes
How many GPU(s) should be used for distributed training? [1]:2
Do you wish to use FP16 or BF16 (mixed precision)? [NO/fp16/bf16]: fp16
Oh you are deepspeed, I tried without deepspeed. Please make sure to post the detailed command when opening issues so we could reproduce fast :)
I haven't tried this with Zero-stage-3 but it should work with stage2, stage3 is not really required for stable diffusion, it should work fine with stage2 as it's not a huge model so it does not need the parameter partitioning that stage3 offers,
Also note that, --train_text_encoder is not supported with deepspeed for now.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
你好,你的邮件已收到,谢谢!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.