diffusers HELP, can I do img2img on multi imgs at the same time?

In my case, I need to do img2img on multiple images, I saw the parameter "image" can be a list of PIL.image.iamge but when I used a list of images as input, I got an error like below.

File "/mnt/disk1/1/code/ip-adapter/infer-face-diffusers-lcm-upscaleM1.py", line 107, in ip image_hires = pipes_img2img[i](ip_adapter_image=image, image=imgs_resize, prompt=prompt, negative_prompt=n_prompt_default, num_inference_steps=steps, guidance_scale=gs_scale, strength=strength).images File "/home/1/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/1/.local/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py", line 1083, in call noise_pred = self.unet( File "/home/1/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/1/.local/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1121, in forward sample, res_samples = downsample_block( File "/home/1/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/1/.local/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1199, in forward hidden_states = attn( File "/home/1/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/1/.local/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py", line 391, in forward hidden_states = block( File "/home/1/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/1/.local/lib/python3.10/site-packages/diffusers/models/attention.py", line 378, in forward hidden_states = attn_output + hidden_states RuntimeError: The size of tensor a (43520) must match the size of tensor b (21760) at non-singleton dimension 1

How should I do this correctly? If I can do multi img2img at the same time, that would be great. HELP, thank you!

Feb 23 '24 04:02 blx0102

hi @blx0102 thanks for the issue! can you provide a reproducible script?

YiYi

Feb 23 '24 21:02 yiyixuxu

@yiyixuxu Sorry for the late reply, here's parts of my code:

pipe_img2img = StableDiffusionImg2ImgPipeline.from_pretrained(base_model_path, torch_dtype=torch.float16, vae=vae)
pipe_img2img.scheduler = LCMScheduler.from_config(pipe_img2img.scheduler.config)
pipe_img2img.load_ip_adapter("./resource/IP-Adapter/", subfolder="models", weight_name="ip-adapter-plus-face_sd15.bin")
pipe_img2img.load_lora_weights(lcm_lora, adapter_name="lcm")
pipe_img2img.set_adapters(["lcm"], adapter_weights=[1.0])
pipes_img2img.set_ip_adapter_scale(scale)
pipe_img2img.to("cuda")

image = Image.open("face.png") # local image for ip-adapter
imgs_resize = [img1, img2]     # I have a list of images need to do img2img
images_hires = pipes_img2img(ip_adapter_image=image, image=imgs_resize, prompt=prompt, negative_prompt=n_prompt_default, num_inference_steps=steps, guidance_scale=0.0, strength=0.7).images

Feb 26 '24 03:02 blx0102