generative-models Stable Diffusion XL - M1 mac doesn't work with fp16 on tutorial script - LLVM ERROR: Failed to infer result type(s)

Getting this issue still on trying the basic tutorial for SDXL inference (16GB MacBook Pro M1).

This mostly works (if I strip out the tutorial's recommendation for fp16) - but takes forever (iteration time 66 seconds), and then dies on the 50th iteration due to "MPS backend out of memory":

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", use_safetensors=True)
pipe = pipe.to("mps")
pipe.enable_attention_slicing()
 prompt = "An astronaut riding a green horse"
 images = pipe(prompt=prompt).images[0]

The recommended call:

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")

results in the error previously mentioned:

loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x77x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
Abort trap: 6

/Users/mike/miniconda3/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Running torch 2.0.1, installed from the requirements.txt as per the README on this repo.

Anything I can do? I've got it working successfully on a 1080 Ti and a T4 (just following tutorial with no modifications), but I'm stuck on the M1.

Aug 07 '23 12:08 mbewley

Same issue here on MacBook Pro M2 Max in a REPL (using pyenv and pyenv-virtualenv):

>>> from diffusers import DiffusionPipeline
>>> import torch
>>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
Loading pipeline components...: 100%|█████████████| 7/7 [00:00<00:00,  7.90it/s]
>>> images = pipe(prompt="An astronaut riding a horse").images[0]
loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/d9889869-120b-11ee-b796-7a03568b17ac/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x77x1xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort      python
/Users/user/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

After all of this is printed to the console, the REPL exits completely, and I am returned to the shell.

Aug 08 '23 21:08 ZelnickB

@mbewley, could you please add to the title of this issue that the problem is with Stable Diffusion XL? I believe that this repository is for several generative models and not just SDXL.

Aug 08 '23 21:08 ZelnickB

Done - sorry - that's all I've tested it on, not sure about whether it impacts more broadly.

Aug 09 '23 11:08 mbewley

Can also confirm on MacBook Pro M2 Max running in a conda env. Changing torch_dtype=torch.float16 to torch_dtype=torch.float32 fixed the issue for me.

Aug 13 '23 22:08 grahamcracker1234

This is remains a problem on M2 MacBooks with PyTorch@latest on MacOS Sonoma. Using the torch.float32 dtype (or the --no-half CLI arg for AUTOMATIC1111 users) works, albeit at a glacial pace.

Sep 10 '23 19:09 WildDanDan

if you're on Sonoma try pip install -U torch torchvision torchdata torchaudio Make sure the version of torch it installs is 2.1.

If you not on Sonoma there a load of fp16 fixes that need applying to torch I've been running with fp16 has ages, and have a git repo showing how to get it working on a 8Gb M1 https://github.com/Vargol/8GB_M1_Diffusers_Scripts/tree/main/sdxl

@WildDanDan I'd look into other SD apps if I was you Auto1111 and Apple Silicon have never mixed that well I use InvokeAI when not using my own Diffusers scripts but there are others.

Nov 22 '23 11:11 Vargol

try the special pipeline: pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0") works for me

Dec 25 '23 11:12 hangerrits