Depth pipeline gives error at fp16

Open jonahclarsen opened this issue 1 year ago • 0 comments

Describe the bug

When using the StableDiffusionDepth2ImgPipeline, I get RuntimeError: Input type (MPSFloatType) and weight type (MPSHalfType) should be the same if I'm using fp16 with MPS on my M2.

I've already figured out a fix: https://github.com/jonahclarsen/diffusers-bugfixes/commit/d738833a9ddbd954268a5b19f9d8ef8d6db8dde3

Should I make this a PR?

Note on MPS, even after this fix, it still falls back to the CPU for the upsample_bicubic2d op, hence me enabling MPS fallback. But even with this, it's still ~3x faster than using the pipeline at fp32.

Reproduction

import os os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = '1' # Must be done before importing diffusers import diffusers, torch

device = "mps" model_id = "stabilityai/stable-diffusion-2-depth"

cache_dir="path/to/models_cache"

pipe = diffusers.StableDiffusionDepth2ImgPipeline.from_pretrained(model_id, cache_dir=cache_dir, local_files_only=True, # My local depth model was downloaded at fp32, I imagine this doesn't make a difference though torch_dtype=torch.float16, variant="fp16") pipe = pipe.to(device)

images = pipe( prompt="Two golden retriever puppies.", image=diffusers.utils.load_image("http://images.cocodataset.org/val2017/000000039769.jpg"), num_inference_steps=15, ).images images[0].show()

Logs

Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  5.50it/s]
Traceback (most recent call last):
  File "/project/temp.py", line 19, in <module>
    images = pipe(
  File "/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_depth2img.py", line 779, in __call__
    depth_mask = self.prepare_depth_map(
  File "/env/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_depth2img.py", line 553, in prepare_depth_map
    depth_map = self.depth_estimator(pixel_values).predicted_depth
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/transformers/models/dpt/modeling_dpt.py", line 1159, in forward
    outputs = self.dpt(
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/transformers/models/dpt/modeling_dpt.py", line 932, in forward
    embedding_output = self.embeddings(pixel_values, return_dict=return_dict)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/transformers/models/dpt/modeling_dpt.py", line 192, in forward
    backbone_output = self.backbone(pixel_values)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/transformers/models/bit/modeling_bit.py", line 881, in forward
    outputs = self.bit(pixel_values, output_hidden_states=True, return_dict=True)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/transformers/models/bit/modeling_bit.py", line 735, in forward
    embedding_output = self.embedder(pixel_values)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/transformers/models/bit/modeling_bit.py", line 291, in forward
    embedding = self.convolution(pixel_values)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/transformers/models/bit/modeling_bit.py", line 148, in forward
    hidden_state = nn.functional.conv2d(
RuntimeError: Input type (MPSFloatType) and weight type (MPSHalfType) should be the same

System Info

diffusers version: 0.27.2
Platform: macOS-14.2-arm64-arm-64bit
Python version: 3.10.11
PyTorch version (GPU?): 2.3.0 (False)
Huggingface_hub version: 0.22.0
Transformers version: 4.39.1
Accelerate version: 0.28.0
xFormers version: not installed
Using GPU in script?: M2 GPU via mps yes
Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @DN6 @sayakpaul @yiyixuxu @DN6

May 22 '24 18:05 jonahclarsen