diffusers move long-prompt weighting code to utils

Long-prompt weighting pipeline can't be used with other pipelines e.g. StableDiffusionKDiffusionPipeline

this PR moves long-prompt weighting code to utils so that long-prompt weighting can be used with any pipeline:

from diffusers import StableDiffusionKDiffusionPipeline
from diffusers import utils
import types
import torch

pipe = StableDiffusionKDiffusionPipeline.from_pretrained(
      "frankjoshua/icbinpICantBelieveIts_v8",
       torch_dtype=torch.float16
)
pipe.set_scheduler('sample_dpmpp_2m')
pipe=pipe.to("cuda")
# replace encode_prompt in the pipe with long-prompt weighting
pipe.encode_prompt = types.MethodType(utils.encode_prompt, pipe)

Sep 19 '23 08:09 noskill

Also there is code by @takuma104 https://gist.github.com/takuma104/43552b8ec70b63323c57dc9c6fcb9b90, perhaps there should be a community "utils" or "contrib" module, some code doesn't need it's own pipeline..

Sep 19 '23 09:09 noskill

this removes the dependency on compel,

Sep 19 '23 09:09 adhikjoshi

I am okay with that. But not super sure on the maintenance part. I think it should be a bit community-driven to start with. WDYT?

Sep 19 '23 10:09 sayakpaul

it seems like a very generic name for something that might only apply to Stable Diffusion pipelines. could this work for, say DeepFloyd or Kandinsky?

Sep 19 '23 20:09 bghira

@bghira generated image changes a little depending on weights, but much less than i see with stable diffusion models. I'll test it with DeepFloyd

from diffusers import DiffusionPipeline
from diffusers import utils

import types
import torch


generator = torch.Generator("cuda").manual_seed(3842793274)

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

pipe_prior.encode_prompt = types.MethodType(utils.encode_prompt, pipe_prior)

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
t2i_pipe.to("cuda")
t2i_pipe.encode_prompt = types.MethodType(utils.encode_prompt, t2i_pipe)


prompt = "A alien cheeseburger creature eating itself, claymation, (cinematic:0.7), (moody lighting:0.7)"
negative_prompt = "low quality, bad quality"

image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=6.0, generator=generator).to_tuple()
image = t2i_pipe(
    prompt, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, generator=generator).images[0]
image.save("cheeseburger_monster0.png")

Sep 20 '23 18:09 noskill

different versions of Kandinsky have different text encoders:

2.2 uses OpenCLIP bigG, same as SDXL refiner and one of SDXL base TEs,
2.1 used XLM-Roberta-Large-Vit-L-14,
and 2.0 used two small text encoders

i can imagine each behaving differently or breaking with this.

Sep 20 '23 19:09 bghira

Kandinsky 2-2 works with lpw similar to 2-1, image changes a little bit

from diffusers import DiffusionPipeline
from diffusers import utils
import functools

import types
import torch


generator = torch.Generator("cuda").manual_seed(3842793274)

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

pipe_prior.encode_prompt = types.MethodType(functools.partial(utils.encode_prompt, max_embeddings_multiples=10), pipe_prior)

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)
t2i_pipe.to("cuda")
t2i_pipe.encode_prompt = types.MethodType(functools.partial(utils.encode_prompt, max_embeddings_multiples=10), t2i_pipe)


prompt = "A alien cheeseburger creature eating itself, claymation, (cinematic:2.7), (moody lighting:2.7)"
negative_prompt = "low quality, bad quality"

image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=6.0, generator=generator).to_tuple()
image = t2i_pipe(image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, generator=generator).images[0]
image.save("cheeseburger_monster5.png")

Sep 21 '23 10:09 noskill

the sd-xl pipline need prompt_embed_pool,does this support it

Sep 22 '23 01:09 yijinsheng

@yijinsheng no

Sep 25 '23 16:09 noskill

@bghira it is possible to pass embeddings from encode_prompt function to deepfloyd pipeline, but picture quality got much worse:

So from tests it looks like only SD pipelines can benefit from this code so i moved lpw to lpw_stable_diffusion.py

i'll take a look if it is possible to extract common code from stable diffusion and stable diffusion xl lpw pipelines

Sep 25 '23 16:09 noskill

almost undoubtedly it CAN work on the other models, but likely will require investigation into how to do it.

Sep 25 '23 16:09 bghira

@yiyixuxu @sayakpaul anybody has time to take over this PR?

Sep 25 '23 17:09 patrickvonplaten

@bghira does LPW support kohya style lora prompts <lora:yyyyy:1.2>?

Sep 28 '23 15:09 sarmientoj24

Prompt weighting is supported via compel: https://huggingface.co/docs/diffusers/using-diffusers/weighted_prompts

Sep 28 '23 15:09 sayakpaul

Prompt weighting is supported via compel: https://huggingface.co/docs/diffusers/using-diffusers/weighted_prompts

We can nevertheless support lpw more natively in the diffusers core codebase

Sep 29 '23 07:09 patrickvonplaten

@sarmientoj24 it doesn't support lora prompts. I would rather add this and other features in a separate PR.

Oct 02 '23 13:10 noskill

@sarmientoj24 it doesn't support lora prompts. I would rather add this and other features in a separate PR.

def process_lora_prompt(prompt):
    lora = re.compile(r'<lora:([^:]+):([\d\.\-]+)>')
    lora_matches = lora.findall(prompt)
    lora_matches = [(f'({name}:{weight})', float(weight))
                    for name, weight in lora_matches]
    filtered_prompt = lora.sub('', prompt)
    filtered_prompt += ' '.join(f'({name}:{weight})' for name,
                                weight in lora_matches)
    return filtered_prompt

here is function to extract lora from prompt

Oct 02 '23 15:10 adhikjoshi

i don't think that's how you handle LoRA prompting, that merely removes the loras from the prompt.

Oct 02 '23 15:10 bghira

@adhikjoshi btw there is already skip_weighting flag in get_weighted_text_embeddings, so we can reuse lpw implementation in StableDiffusionPipeline

Oct 02 '23 18:10 noskill

@DN6 @yijinsheng do i need to do anything in this PR? Adding lora and hypernetworks as suggested above is a nice feature but this can be done in follow up PRs..

Oct 31 '23 08:10 noskill

@DN6 @yiyixuxu can you give this a look?

Nov 01 '23 20:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Nov 26 '23 15:11 github-actions[bot]

Not stale.

Nov 27 '23 02:11 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Dec 26 '23 15:12 github-actions[bot]

Should not be stale?

Feb 25 '24 23:02 Scorpinaus

@yiyixuxu a gentle bump here.

Feb 26 '24 02:02 sayakpaul

@sayakpaul Maybe we could introduce a PromptEncoder class to pass it into pipeline constructor, or maybe a mixin class?

Using types.MethodType is not the best possible way to change prompt encoding.. but i believe this PR is a step in the right direction

Mar 14 '24 07:03 noskill

Requesting some inputs from @yiyixuxu here.

Mar 14 '24 07:03 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 07 '24 15:04 github-actions[bot]

Hi @noskill How did you envision using this in the Pipelines? By just replacing the call to encode_prompt with the lpw version of encode_prompt?

Apr 08 '24 04:04 DN6