Issue loading and running inpainting pipeline

Open EnricoBeltramo opened this issue 3 years ago • 0 comments

Describe the bug

Following this tutorial: https://huggingface.co/runwayml/stable-diffusion-inpainting

I tried to made an inference with and image jpg of size 283x530 (and a mask png of same size) , but I have an error:

Sizes of tensors must match except in dimension 1. Expected size 64 but got size 66 for tensor number

The image and mask shall have resized?

Reproduction

from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained( "runwayml/stable-diffusion-inpainting" ) prompt = "Face of a yellow cat, high resolution, sitting on a park bench" #image and mask_image should be PIL images. #The mask structure is white for inpainting and black for keeping as is image = pipe(prompt=prompt, image=img, mask_image=mask).images[0]

Logs

│                                                                                                  │
│ [Errno 2] No such file or directory: '/tmp/ipykernel_1398/2371196233.py'                         │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py:27 in decorate_context        │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /opt/conda/lib/python3.7/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diff │
│ usion_inpaint.py:846 in __call__                                                                 │
│                                                                                                  │
│   843 │   │   │   │                                                                              │
│   844 │   │   │   │   # concat latents, mask, masked_image_latents in the channel dimension      │
│   845 │   │   │   │   latent_model_input = self.scheduler.scale_model_input(latent_model_input   │
│ ❱ 846 │   │   │   │   latent_model_input = torch.cat([latent_model_input, mask, masked_image_l   │
│   847 │   │   │   │                                                                              │
│   848 │   │   │   │   # predict the noise residual                                               │
│   849 │   │   │   │   noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=prom   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 66 for tensor number
2 in the list.

System Info

diffusers version: 0.14.0.dev0
Platform: Linux-5.15.65+-x86_64-with-debian-bullseye-sid
Python version: 3.7.12
PyTorch version (GPU?): 1.13.1+cu116 (True)
Huggingface_hub version: 0.12.1
Transformers version: 4.26.1
Accelerate version: 0.16.0
xFormers version: 0.0.16
Using GPU in script?: YES
Using distributed or parallel set-up in script?: NN

Feb 20 '23 23:02 EnricoBeltramo