Depth pipeline gives error at fp16
Describe the bug
When using the StableDiffusionDepth2ImgPipeline, I get RuntimeError: Input type (MPSFloatType) and weight type (MPSHalfType) should be the same if I'm using fp16 with MPS on my M2.
I've already figured out a fix: https://github.com/jonahclarsen/diffusers-bugfixes/commit/d738833a9ddbd954268a5b19f9d8ef8d6db8dde3
Should I make this a PR?
Note on MPS, even after this fix, it still falls back to the CPU for the upsample_bicubic2d op, hence me enabling MPS fallback. But even with this, it's still ~3x faster than using the pipeline at fp32.
Reproduction
import os os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = '1' # Must be done before importing diffusers import diffusers, torch
device = "mps" model_id = "stabilityai/stable-diffusion-2-depth"
cache_dir="path/to/models_cache"
pipe = diffusers.StableDiffusionDepth2ImgPipeline.from_pretrained(model_id, cache_dir=cache_dir, local_files_only=True, # My local depth model was downloaded at fp32, I imagine this doesn't make a difference though torch_dtype=torch.float16, variant="fp16") pipe = pipe.to(device)
images = pipe( prompt="Two golden retriever puppies.", image=diffusers.utils.load_image("http://images.cocodataset.org/val2017/000000039769.jpg"), num_inference_steps=15, ).images images[0].show()
Logs
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 5.50it/s]
Traceback (most recent call last):
File "/project/temp.py", line 19, in <module>
images = pipe(
File "/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/env/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_depth2img.py", line 779, in __call__
depth_mask = self.prepare_depth_map(
File "/env/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_depth2img.py", line 553, in prepare_depth_map
depth_map = self.depth_estimator(pixel_values).predicted_depth
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/env/lib/python3.10/site-packages/transformers/models/dpt/modeling_dpt.py", line 1159, in forward
outputs = self.dpt(
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/env/lib/python3.10/site-packages/transformers/models/dpt/modeling_dpt.py", line 932, in forward
embedding_output = self.embeddings(pixel_values, return_dict=return_dict)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/env/lib/python3.10/site-packages/transformers/models/dpt/modeling_dpt.py", line 192, in forward
backbone_output = self.backbone(pixel_values)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/env/lib/python3.10/site-packages/transformers/models/bit/modeling_bit.py", line 881, in forward
outputs = self.bit(pixel_values, output_hidden_states=True, return_dict=True)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/env/lib/python3.10/site-packages/transformers/models/bit/modeling_bit.py", line 735, in forward
embedding_output = self.embedder(pixel_values)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/env/lib/python3.10/site-packages/transformers/models/bit/modeling_bit.py", line 291, in forward
embedding = self.convolution(pixel_values)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/env/lib/python3.10/site-packages/transformers/models/bit/modeling_bit.py", line 148, in forward
hidden_state = nn.functional.conv2d(
RuntimeError: Input type (MPSFloatType) and weight type (MPSHalfType) should be the same
System Info
-
diffusersversion: 0.27.2 - Platform: macOS-14.2-arm64-arm-64bit
- Python version: 3.10.11
- PyTorch version (GPU?): 2.3.0 (False)
- Huggingface_hub version: 0.22.0
- Transformers version: 4.39.1
- Accelerate version: 0.28.0
- xFormers version: not installed
- Using GPU in script?: M2 GPU via mps yes
- Using distributed or parallel set-up in script?: no
Who can help?
@yiyixuxu @DN6 @sayakpaul @yiyixuxu @DN6