diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Dreambooth finetune FLUX dev CLIPTextModel

Open Wuyiche opened this issue 1 year ago • 5 comments

Describe the bug

ValueError: Sequence length must be less than max_position_embeddings (got sequence length: 77 and max_position_embeddings: 0

I used four A100 to full amount of fine-tuning Flux. 1 dev model, according to https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md

I used the toy dog dataset (5 images) for fine-tuning. I ran into a problem with max_position_embeddings for CLIPTextModel:

Reproduction

[rank1]: Traceback (most recent call last): [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1812, in [rank1]: main(args) [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1351, in main [rank1]: instance_prompt_hidden_states, instance_pooled_prompt_embeds, instance_text_ids = compute_text_embeddings( [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 1339, in compute_text_embeddings [rank1]: prompt_embeds, pooled_prompt_embeds, text_ids = encode_prompt( [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 963, in encode_prompt [rank1]: pooled_prompt_embeds = _encode_prompt_with_clip( [rank1]: File "/data/AIGC/diffusers/examples/dreambooth/train_dreambooth_flux.py", line 937, in _encode_prompt_with_clip [rank1]: prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=False) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1056, in forward [rank1]: return self.text_model( [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 947, in forward [rank1]: hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/root/anaconda3/envs/flux/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 283, in forward [rank1]: raise ValueError( [rank1]: ValueError: Sequence length must be less than max_position_embeddings (got sequence length: 77 and max_position_embeddings: 0

I changed max_position_embeddings in CLIPTextModel but it doesn't work: text_encoder_one = class_one.from_pretrained( args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision, variant=args.variant, max_position_embeddings=77,ignore_mismatched_sizes=True )

My training script is as follows:

export MODEL_NAME="black-forest-labs/FLUX.1-dev" export INSTANCE_DIR="dog" export OUTPUT_DIR="trained-flux"

accelerate launch train_dreambooth_flux.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--output_dir=$OUTPUT_DIR
--mixed_precision="bf16"
--instance_prompt="a photo of sks dog"
--resolution=1024
--train_batch_size=1
--guidance_scale=1
--gradient_accumulation_steps=4
--optimizer="prodigy"
--learning_rate=1.
--report_to="wandb"
--lr_scheduler="constant"
--lr_warmup_steps=0
--max_train_steps=500
--validation_prompt="A photo of sks dog in a bucket"
--validation_epochs=25
--seed="0"
--push_to_hub

Logs


System Info

  • 🤗 Diffusers version: 0.33.0.dev0
  • Platform: Linux-5.4.0-146-generic-x86_64-with-glibc2.31
  • Running on Google Colab?: No
  • Python version: 3.10.16
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.29.1
  • Transformers version: 4.49.0
  • Accelerate version: 1.4.0
  • PEFT version: 0.14.0
  • Bitsandbytes version: not installed
  • Safetensors version: 0.5.3
  • xFormers version: not installed
  • Accelerator: NVIDIA A100-SXM4-40GB, 40960 MiB NVIDIA A100-SXM4-40GB, 40960 MiB NVIDIA A100-SXM4-40GB, 40960 MiB NVIDIA A100-SXM4-40GB, 40960 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Wuyiche avatar Feb 28 '25 05:02 Wuyiche

same here, following

mortorit avatar Mar 11 '25 15:03 mortorit

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 05 '25 15:04 github-actions[bot]

same question

ziko-21 avatar May 22 '25 06:05 ziko-21

same question as well

jamesBaker361 avatar May 23 '25 23:05 jamesBaker361

I run into the same problem when using Deepspeed stage 2 to 3 (modifying accelerate config) stage 2 config (work)

compute_environment: LOCAL_MACHINE debug: false deepspeed_config: gradient_accumulation_steps: 1 offload_optimizer_device: none offload_param_device: none zero3_init_flag: false zero_stage: 2 distributed_type: DEEPSPEED downcast_bf16: 'no' enable_cpu_affinity: false machine_rank: 0 main_training_function: main mixed_precision: bf16 num_machines: 1 num_processes: 4 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

stage 3 config modification (fail)

zero_stage: 3

for reference, maybe it is related to accelerate or deepspeed?

double8fun avatar Jun 17 '25 09:06 double8fun

same here, following

panpan2panpan avatar Jul 02 '25 10:07 panpan2panpan

same here, following

kangyeolk avatar Jul 03 '25 12:07 kangyeolk

same question

waltonfuture avatar Jul 07 '25 08:07 waltonfuture

same here, following

biteNi avatar Jul 13 '25 10:07 biteNi

same here, following

LiaoFJ avatar Jul 16 '25 08:07 LiaoFJ

same here,following

xdqf2128 avatar Sep 03 '25 09:09 xdqf2128

same here,following

ecjojo avatar Sep 06 '25 12:09 ecjojo

same here,following

smh0221 avatar Sep 15 '25 07:09 smh0221

same here,following

zghhui avatar Oct 07 '25 10:10 zghhui

anyone fix this?

infectedresearch avatar Nov 12 '25 21:11 infectedresearch