diffusers error in using stable cascade with long prompt

Hi,

When I use stable cascade model with long prompt, I get below error.

Token indices sequence length is longer than the specified maximum sequence length for this model (165 > 77). Running this sequence through the model will result in indexing errors

I try to use compel library to fix this problem, but it doesn't work.

Thanks

Apr 14 '24 13:04 saeedkhanehgir

@saeedkhanehgir Can you share a code example that produces this error? As well as the full traceback. Currently the maximum supported length for a prompt is SD Cascade is 77 tokens, but the prompt should be getting truncated with a warning.

Apr 15 '24 02:04 DN6

@DN6 Thanks for your answer

here is my inference code.

import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

prompt = "Portrait of an Asian woman, facing the audience, looking at the viewers, long black hair, facing the camera, wearing a t-shirt with the inscription 'SmiLe editing', denim jacket and short curvy fat body, standing at the edge of the river, with waterfalls and mountains in the forest as background, bright blue cloudy sky, close-up, realistic, 32k, HDR"
negative_prompt = ""

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", variant="bf16", torch_dtype=torch.bfloat16).to('cuda')
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", variant="bf16", torch_dtype=torch.float16).to('cuda')

prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=4.0,
    num_images_per_prompt=1,
    num_inference_steps=20,
)

decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings.to(torch.float16),
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=0.0,
    output_type="pil",
    num_inference_steps=10,
).images[0]
decoder_output.save("cascade.png")

and this is message

Token indices sequence length is longer than the specified maximum sequence length for this model (79 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: [', hdr']

Apr 15 '24 06:04 saeedkhanehgir

that's actually not an error, it's a warning. and the "part of your input was truncated" message indicates it works as expected.

the message still shows up with Compel, but not the part about truncating the prompt.

the way the long prompt handling is implemented isn't great, but there's hardly many other options. it lobotomises the positional embed. and it's especially an issue with models with pooled embeds, where things get hairy.

Apr 15 '24 14:04 bghira

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 14 '24 15:05 github-actions[bot]

@saeedkhanehgir Closing this issue for now since the pipeline isn't throwing an error. For help with dealing with long prompts, it might be better to open a thread in the Discussions section.

May 21 '24 11:05 DN6

Hi @saeedkhanehgir, Can you share source code using compel for Stable Cascade? Thank you

Aug 08 '24 04:08 duonglegiang