diffusers Failed to load CLIPTextModel

Describe the bug

train model with OneTrainer and Everydream2, both same error

this one with onetrainer

Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 115922.97it/s]
Loading pipeline components...:  33%|████████████████████████████████████████████████████████████████▎                                                                                                                                | 2/6 [00:00<00:00, 178.17it/s]
Traceback (most recent call last):
  File "/home/zznet/.local/lib/python3.10/site-packages/diffusers/loaders/single_file.py", line 491, in from_single_file
    loaded_sub_model = load_single_file_sub_model(
  File "/home/zznet/.local/lib/python3.10/site-packages/diffusers/loaders/single_file.py", line 156, in load_single_file_sub_model
    raise SingleFileComponentError(
diffusers.loaders.single_file_utils.SingleFileComponentError: Failed to load CLIPTextModel. Weights for this component appear to be missing in the checkpoint.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zznet/workspace/ai-pipe/prod-ai-pipe-replace-model/runpod_app.py", line 14, in <module>
    from pipe import text2img, getText2imgPipe, set_sampler, compel
  File "/home/zznet/workspace/ai-pipe/prod-ai-pipe-replace-model/pipe.py", line 24, in <module>
    text2imgPipe = StableDiffusionControlNetPipeline.from_single_file(
  File "/home/zznet/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/zznet/.local/lib/python3.10/site-packages/diffusers/loaders/single_file.py", line 506, in from_single_file
    raise SingleFileComponentError(
diffusers.loaders.single_file_utils.SingleFileComponentError: Failed to load CLIPTextModel. Weights for this component appear to be missing in the checkpoint.
Please load the component before passing it in as an argument to `from_single_file`.

text_encoder = CLIPTextModel.from_pretrained('...')
pipe = StableDiffusionControlNetPipeline.from_single_file(<checkpoint path>, text_encoder=text_encoder)

this one use everydream https://github.com/huggingface/diffusers/issues/7506#issuecomment-2116719423

Reproduction

text2imgPipe = StableDiffusionControlNetPipeline.from_single_file(
  './base.safetensors',
  # vae = vae,
  # '/home/zznet/workspace/stable-diffusion-webui/models/Stable-diffusion/majicmixRealistic_v7.safetensors',
  # '/home/zznet/workspace/1-ot/save/2024-04-30_11-47-02-save-1650-110-0.safetensors',
  controlnet = [
    depth_control,
    softedge_control,
    inpaint_control
  ],
  torch_dtype = torch.float16,
  # custom_pipeline = 'lpw_stable_diffusion'
)

Logs

No response

System Info

main branch

Who can help?

No response

May 22 '24 06:05 crapthings

okay revert to 0.27.2 works

May 22 '24 06:05 crapthings

Hi @crapthings based on the traceback, it looks like the CLIP text model weights aren't present your checkpoint?

May 22 '24 07:05 DN6

@DN6

i've upgraded to 0.28.0, i still got this error, what is wrong? its working on 0.27.2

i try to load one, but still error

clip_model_id = 'laion/CLIP-ViT-B-32-laion2B-s34B-b79K'

feature_extractor = CLIPImageProcessor.from_pretrained(clip_model_id)
clip_model = CLIPModel.from_pretrained(clip_model_id)

text2imgPipe = StableDiffusionControlNetPipeline.from_single_file(
  './base.safetensors',
  clip_model = clip_model,
  custom_pipeline = 'clip_guided_stable_diffusion',
  controlnet = [
    depth_control,
    softedge_control,
    inpaint_control,
    openpose_control
  ],
  torch_dtype = torch.float16,
)

May 28 '24 01:05 crapthings

Is it possible for you to host the base.safetensors checkpoint you're using on the HF Hub and share it?

May 28 '24 13:05 DN6

Is it possible for you to host the base.safetensors checkpoint you're using on the HF Hub and share it?

https://huggingface.co/crapthings/diffusers-issue/tree/main

May 28 '24 14:05 crapthings

Hi @crapthings it appears that the cond_stage_model.transformer.text_model.embeddings.position_ids key is missing from your checkpoint, which is what from_single_file in 0.28.0 uses to identify the CLIP model in the checkpoint.

Did you train your model using the scripts in OneTrainer or the UI? I'm trying to understand why that key isn't present when saving the model given the fact that it appears to be there in other single file checkpoints such as MajicMix.

May 29 '24 06:05 DN6

Hi @crapthings it appears that the cond_stage_model.transformer.text_model.embeddings.position_ids key is missing from your checkpoint, which is what from_single_file in 0.28.0 uses to identify the CLIP model in the checkpoint.

Did you train your model using the scripts in OneTrainer or the UI? I'm trying to understand why that key isn't present when saving the model given the fact that it appears to be there in other single file checkpoints such as MajicMix.

i think i use both OneTrainer UI and Everydream2, they all failed yes the model is finetuned based on majicmix-realistic

May 29 '24 06:05 crapthings

0.27.2 have such warning but model can still load

Some weights of the model checkpoint were not used when initializing CLIPTextModel: 
 ['text_model.embeddings.position_ids']

does this effect image quality?

May 29 '24 14:05 crapthings

@crapthings No it shouldn't affect the quality. I've opened a PR with a fix as well.

May 30 '24 14:05 DN6

@crapthings could you try installing diffusers from main and see if the issue persists?

May 31 '24 05:05 DN6

@DN6

the main branch is working now~

May 31 '24 07:05 crapthings

Closing as completed.

Jun 03 '24 03:06 DN6

Hi, I synced up to diffusers 0.28.2 and still see the warning. Does the checkpoint file also need to be updated, if its carrying the previous key? @DN6

Repro code: pipe = StableDiffusionXLPipeline.from_single_file( "local_path_to/sd_xl_base_1.0_0.9vae.safetensors", add_watermarker=False, torch_dtype=torch.float16, variant="fp16", use_safetensors=True, )

checkpoint is the one here.

Jun 05 '24 09:06 darshats