diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Stable Cascade Image to Image Pipeline

Open ekurtulus opened this issue 1 year ago • 5 comments

Are there any plans for Image2Image pipeline for the StableCascade model ?

ekurtulus avatar Apr 03 '24 21:04 ekurtulus

In theory, it should be doable with StableCascadeCombinedPipeLine. It accepts an `images' argument that can be a PIL image, a torch tensor, or a list of either. Unfortunately, I can't get it to accept a bfloat16 type for the image. It raises a runtime error in CLIP. I tried float32, but HF's A10G Large runs out of memory.

Although I'm an experienced coder, I don't think I can justify the time it would take me to dig deeply enough to come up with a fix. Hope someone else knows enough to concoct a solution.

Here's the error I'm seeing when I try to pass an image encoded as torch.bfloat16

File "/home/user/app/app.py", line 50, in generate_image
    results =  pipe(
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_combined.py", line 268, in __call__
    prior_outputs = self.prior_pipe(
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py", line 504, in __call__
    image_embeds_pooled, uncond_image_embeds_pooled = self.encode_image(
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py", line 254, in encode_image
    image = self.feature_extractor(image, return_tensors="pt").pixel_values
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 551, in __call__
    return self.preprocess(images, **kwargs)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/transformers/models/clip/image_processing_clip.py", line 306, in preprocess
    images = [to_numpy_array(image) for image in images]
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/transformers/models/clip/image_processing_clip.py", line 306, in <listcomp>
    images = [to_numpy_array(image) for image in images]
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/transformers/image_utils.py", line 174, in to_numpy_array
    return to_numpy(img)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/transformers/utils/generic.py", line 308, in to_numpy
    return framework_to_numpy[framework](obj)
  File "/home/user/.pyenv/versions/3.10.14/lib/python3.10/site-packages/transformers/utils/generic.py", line 293, in <lambda>
    "pt": lambda obj: obj.detach().cpu().numpy(),
TypeError: Got unsupported ScalarType BFloat16

FWIW, here are relevant snippets from the code that produced the above error:

# Define a transform to convert a PIL (method given by Claude 3 Sonnet)
def transform(image):
    # Convert the image to a PyTorch tensor
    input_tensor = torch.from_numpy(np.array(image)).permute(2, 0, 1).unsqueeze(0)
    # Convert the tensor to 'bfloat16' dtype
    input_tensor = input_tensor.to(torch.bfloat16)
    return input_tensor

# Ensure model and scheduler are initialized in GPU-enabled function
if torch.cuda.is_available():
    pipe = StableCascadeCombinedPipeline.from_pretrained(repo, torch_dtype=torch.bfloat16)
    pipe.to("cuda")

# The generate function
@spaces.GPU(enable_queue=True)
def generate_image(prompt, image):  
    if image is not None:
        # Convert the PIL image to Torch tensor
        # and move it to GPU
        img_tensor = transform(image)
        img_tensor = [img_tensor.to("cuda")]
    else:
        img_tensor=None

    seed  =  random.randint(-100000,100000)

    results =  pipe(
                prompt=prompt,
                images=img_tensor,
                height=1024,
                width=1024,
                num_inference_steps=20, 
                generator=torch.Generator(device="cuda").manual_seed(seed)
            )
    return results.images[0]

Michael-F-Ellis avatar Apr 05 '24 21:04 Michael-F-Ellis

See https://github.com/huggingface/diffusers/issues/7598#issuecomment-2042897916 which contains a minimal working app.py for img2img using StableCascadeCombinedPipeline

The issue turned out to be that pipe.to('cuda') does not move the prior image encoder to cuda. An extra line is needed to do it manually.

Michael-F-Ellis avatar Apr 08 '24 14:04 Michael-F-Ellis

cc @kashif here does it make sense to make a img2img pipeline for Stable Cascade? from what I understand that image argument in Stable Cascade has a similar role as prompt so it does not work the same way as img2img pipeline

yiyixuxu avatar Apr 08 '24 18:04 yiyixuxu

cc @kashif here does it make sense to make a img2img pipeline for Stable Cascade? from what I understand that image argument in Stable Cascade has a similar role as prompt so it does not work the same way as img2img pipeline

I'd like to understand that in more depth. Empirically, passing an image and a prompt is doing exactly what I expect. I get a result that's clearly based on the input image and influenced by the prompt. Here are three images.

  1. Stable Cascade's output when prompted with "Barad Dur".
  2. A photo I took of the Sir Walter Scott monument in Edinburgh.
  3. The output of prompting with "Barad Dur" and supplying my photo as an image input.

To me, the third image, while not very exciting, is clearly derived from the photo but with the monument enlarged and restyled in a way that's consistent with Stable Cascade's concept of Barad Dur.

Prompt only

BaradDurPromptOnly

Photo

ScottMonument

Prompt + Photo

ScottImageBaradDurPrompt

Michael-F-Ellis avatar Apr 09 '24 02:04 Michael-F-Ellis

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 04 '24 15:05 github-actions[bot]

Closing this issue because of inactivity. Feel free to reopen.

sayakpaul avatar Jun 29 '24 13:06 sayakpaul