Issue loading and running inpainting pipeline
Describe the bug
Following this tutorial: https://huggingface.co/runwayml/stable-diffusion-inpainting
I tried to made an inference with and image jpg of size 283x530 (and a mask png of same size) , but I have an error:
Sizes of tensors must match except in dimension 1. Expected size 64 but got size 66 for tensor number
The image and mask shall have resized?
Reproduction
from diffusers import StableDiffusionInpaintPipeline
pipe = StableDiffusionInpaintPipeline.from_pretrained( "runwayml/stable-diffusion-inpainting" ) prompt = "Face of a yellow cat, high resolution, sitting on a park bench" #image and mask_image should be PIL images. #The mask structure is white for inpainting and black for keeping as is image = pipe(prompt=prompt, image=img, mask_image=mask).images[0]
Logs
│ │
│ [Errno 2] No such file or directory: '/tmp/ipykernel_1398/2371196233.py' │
│ │
│ /opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py:27 in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def _wrap_generator(self, func): │
│ │
│ /opt/conda/lib/python3.7/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diff │
│ usion_inpaint.py:846 in __call__ │
│ │
│ 843 │ │ │ │ │
│ 844 │ │ │ │ # concat latents, mask, masked_image_latents in the channel dimension │
│ 845 │ │ │ │ latent_model_input = self.scheduler.scale_model_input(latent_model_input │
│ ❱ 846 │ │ │ │ latent_model_input = torch.cat([latent_model_input, mask, masked_image_l │
│ 847 │ │ │ │ │
│ 848 │ │ │ │ # predict the noise residual │
│ 849 │ │ │ │ noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=prom │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 66 for tensor number
2 in the list.
System Info
-
diffusersversion: 0.14.0.dev0 - Platform: Linux-5.15.65+-x86_64-with-debian-bullseye-sid
- Python version: 3.7.12
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- Huggingface_hub version: 0.12.1
- Transformers version: 4.26.1
- Accelerate version: 0.16.0
- xFormers version: 0.0.16
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NN