diffusers Type mismatch for LEDITS++

Describe the bug

When running the code snippet from https://huggingface.co/docs/diffusers/api/pipelines/ledits_pp#diffusers.LEditsPPPipelineStableDiffusionXL I get a RuntimeError for as mismatch between the type of the input image and the VAE used to encode the image:

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

Reproduction

The code is located in a script whose path is /home/vdelale/code/test_leditpp.py and is a copy-paste from https://huggingface.co/docs/diffusers/api/pipelines/ledits_pp#diffusers.LEditsPPPipelineStableDiffusionXL I created a virtual environment named test_leditspp with python venv and installed diffusers, torch and transformers. The versions are explicited below.

import torch
import PIL
import requests
from io import BytesIO
from diffusers import LEditsPPPipelineStableDiffusionXL

pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
)

pipe = pipe.to("cuda:3") # only thing I modified from the original code

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg"
image = download_image(img_url)

_ = pipe.invert(
    image = image,
    num_inversion_steps=50,
    skip=0.2
)

edited_image = pipe(
    editing_prompt=["tennis ball","tomato"],
    reverse_editing_direction=[True,False],
    edit_guidance_scale=[5.0,10.0],
    edit_threshold=[0.9,0.85],
).images[0]

Logs

Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with: 

pip install accelerate

/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Loading pipeline components...: 100%|██████████████████████████████████████| 7/7 [00:13<00:00,  1.98s/it]
This pipeline only supports DDIMScheduler and DPMSolverMultistepScheduler. The scheduler has been changed to DPMSolverMultistepScheduler.
Your input images far exceed the default resolution of the underlying diffusion model. The output images may contain severe artifacts! Consider down-sampling the input using the `height` and `width` parameters
Traceback (most recent call last):
  File "/home/vdelale/code/test_leditpp.py", line 20, in <module>
    _ = pipe.invert(
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py", line 1516, in invert
    x0, resized = self.encode_image(image, dtype=self.text_encoder_2.dtype)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py", line 1422, in encode_image
    x0 = self.vae.encode(image).latent_dist.mode()
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 260, in encode
    h = self.encoder(x)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/diffusers/models/autoencoders/vae.py", line 143, in forward
    sample = self.conv_in(sample)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/vdelale/code/test_leditspp/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

System Info

OS: Ubuntu 22.04.3 LTS python version: Python 3.10.12 python packages:

diffusers==0.27.2
torch==2.3.0
transformers==4.40.2

Who can help?

No response

May 17 '24 16:05 vdelale

Thank you for reporting this. This issue has just been resolved recently. Could you install diffusers from the source?

May 17 '24 16:05 tolgacangoz

Tried again:

created and activated virtual env python -m venv .leditspp_env && source .leditspp_env/bin/activate
installed accelerate and transformers pip install accelerate transformers
installed diffusers from source pip install git+https://github.com/huggingface/diffusers as mentionned on Hugging face installation - install from source
ran my python script shared above, unchanged

Seems like the data type mismatch is resolved however, I ran into a CUDA Out of memory error. Quite suprised, it says it tried allocating 136.51 GiB, which seems a lot to me. (I'm working on a H100 GPU with approximately 80GiB of memory).

$ python test_leditspp.py 
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00,  2.02it/s]
This pipeline only supports DDIMScheduler and DPMSolverMultistepScheduler. The scheduler has been changed to DPMSolverMultistepScheduler.
Your input images far exceed the default resolution of the underlying diffusion model. The output images may contain severe artifacts! Consider down-sampling the input using the `height` and `width` parameters
Traceback (most recent call last):
  File "/home/vdelale/code/test_leditspp.py", line 20, in <module>
    _ = pipe.invert(
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py", line 1576, in invert
    image_rec = self.vae.decode(
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 303, in decode
    decoded = self._decode(z, return_dict=False)[0]
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 276, in _decode
    dec = self.decoder(z)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/diffusers/models/autoencoders/vae.py", line 337, in forward
    sample = up_block(sample, latent_embeds)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 2750, in forward
    hidden_states = upsampler(hidden_states)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/diffusers/models/upsampling.py", line 180, in forward
    hidden_states = self.conv(hidden_states)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/vdelale/code/.leditspp_env/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.51 GiB. GPU  has a total capacity of 79.11 GiB of which 21.06 GiB is free. Process 33554 has 1.25 GiB memory in use. Process 3812073 has 2.32 GiB memory in use. Process 3812837 has 1.04 GiB memory in use. Including non-PyTorch memory, this process has 53.40 GiB memory in use. Of the allocated memory 48.46 GiB is allocated by PyTorch, and 4.21 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

May 22 '24 09:05 vdelale

I guess another issue should be opened for this.

May 22 '24 19:05 tolgacangoz