diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

move long-prompt weighting code to utils

Open noskill opened this issue 2 years ago • 42 comments

Long-prompt weighting pipeline can't be used with other pipelines e.g. StableDiffusionKDiffusionPipeline

this PR moves long-prompt weighting code to utils so that long-prompt weighting can be used with any pipeline:

from diffusers import StableDiffusionKDiffusionPipeline
from diffusers import utils
import types
import torch

pipe = StableDiffusionKDiffusionPipeline.from_pretrained(
      "frankjoshua/icbinpICantBelieveIts_v8",
       torch_dtype=torch.float16
)
pipe.set_scheduler('sample_dpmpp_2m')
pipe=pipe.to("cuda")
# replace encode_prompt in the pipe with long-prompt weighting
pipe.encode_prompt = types.MethodType(utils.encode_prompt, pipe)

noskill avatar Sep 19 '23 08:09 noskill

Also there is code by @takuma104 https://gist.github.com/takuma104/43552b8ec70b63323c57dc9c6fcb9b90, perhaps there should be a community "utils" or "contrib" module, some code doesn't need it's own pipeline..

noskill avatar Sep 19 '23 09:09 noskill

this removes the dependency on compel,

adhikjoshi avatar Sep 19 '23 09:09 adhikjoshi

I am okay with that. But not super sure on the maintenance part. I think it should be a bit community-driven to start with. WDYT?

sayakpaul avatar Sep 19 '23 10:09 sayakpaul

it seems like a very generic name for something that might only apply to Stable Diffusion pipelines. could this work for, say DeepFloyd or Kandinsky?

bghira avatar Sep 19 '23 20:09 bghira

@bghira generated image changes a little depending on weights, but much less than i see with stable diffusion models. I'll test it with DeepFloyd

from diffusers import DiffusionPipeline
from diffusers import utils

import types
import torch


generator = torch.Generator("cuda").manual_seed(3842793274)

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

pipe_prior.encode_prompt = types.MethodType(utils.encode_prompt, pipe_prior)

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
t2i_pipe.to("cuda")
t2i_pipe.encode_prompt = types.MethodType(utils.encode_prompt, t2i_pipe)


prompt = "A alien cheeseburger creature eating itself, claymation, (cinematic:0.7), (moody lighting:0.7)"
negative_prompt = "low quality, bad quality"

image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=6.0, generator=generator).to_tuple()
image = t2i_pipe(
    prompt, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, generator=generator).images[0]
image.save("cheeseburger_monster0.png")

noskill avatar Sep 20 '23 18:09 noskill

different versions of Kandinsky have different text encoders:

  • 2.2 uses OpenCLIP bigG, same as SDXL refiner and one of SDXL base TEs,
  • 2.1 used XLM-Roberta-Large-Vit-L-14,
  • and 2.0 used two small text encoders

i can imagine each behaving differently or breaking with this.

bghira avatar Sep 20 '23 19:09 bghira

Kandinsky 2-2 works with lpw similar to 2-1, image changes a little bit

from diffusers import DiffusionPipeline
from diffusers import utils
import functools

import types
import torch


generator = torch.Generator("cuda").manual_seed(3842793274)

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

pipe_prior.encode_prompt = types.MethodType(functools.partial(utils.encode_prompt, max_embeddings_multiples=10), pipe_prior)

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)
t2i_pipe.to("cuda")
t2i_pipe.encode_prompt = types.MethodType(functools.partial(utils.encode_prompt, max_embeddings_multiples=10), t2i_pipe)


prompt = "A alien cheeseburger creature eating itself, claymation, (cinematic:2.7), (moody lighting:2.7)"
negative_prompt = "low quality, bad quality"

image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=6.0, generator=generator).to_tuple()
image = t2i_pipe(image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, generator=generator).images[0]
image.save("cheeseburger_monster5.png")

noskill avatar Sep 21 '23 10:09 noskill

the sd-xl pipline need prompt_embed_pool,does this support it

yijinsheng avatar Sep 22 '23 01:09 yijinsheng

@yijinsheng no

noskill avatar Sep 25 '23 16:09 noskill

@bghira it is possible to pass embeddings from encode_prompt function to deepfloyd pipeline, but picture quality got much worse:

So from tests it looks like only SD pipelines can benefit from this code so i moved lpw to lpw_stable_diffusion.py

i'll take a look if it is possible to extract common code from stable diffusion and stable diffusion xl lpw pipelines

noskill avatar Sep 25 '23 16:09 noskill

almost undoubtedly it CAN work on the other models, but likely will require investigation into how to do it.

bghira avatar Sep 25 '23 16:09 bghira

@yiyixuxu @sayakpaul anybody has time to take over this PR?

patrickvonplaten avatar Sep 25 '23 17:09 patrickvonplaten

@bghira does LPW support kohya style lora prompts <lora:yyyyy:1.2>?

sarmientoj24 avatar Sep 28 '23 15:09 sarmientoj24

Prompt weighting is supported via compel: https://huggingface.co/docs/diffusers/using-diffusers/weighted_prompts

sayakpaul avatar Sep 28 '23 15:09 sayakpaul

Prompt weighting is supported via compel: https://huggingface.co/docs/diffusers/using-diffusers/weighted_prompts

We can nevertheless support lpw more natively in the diffusers core codebase

patrickvonplaten avatar Sep 29 '23 07:09 patrickvonplaten

@sarmientoj24 it doesn't support lora prompts. I would rather add this and other features in a separate PR.

noskill avatar Oct 02 '23 13:10 noskill

@sarmientoj24 it doesn't support lora prompts. I would rather add this and other features in a separate PR.

def process_lora_prompt(prompt):
    lora = re.compile(r'<lora:([^:]+):([\d\.\-]+)>')
    lora_matches = lora.findall(prompt)
    lora_matches = [(f'({name}:{weight})', float(weight))
                    for name, weight in lora_matches]
    filtered_prompt = lora.sub('', prompt)
    filtered_prompt += ' '.join(f'({name}:{weight})' for name,
                                weight in lora_matches)
    return filtered_prompt

here is function to extract lora from prompt

adhikjoshi avatar Oct 02 '23 15:10 adhikjoshi

i don't think that's how you handle LoRA prompting, that merely removes the loras from the prompt.

bghira avatar Oct 02 '23 15:10 bghira

@adhikjoshi btw there is already skip_weighting flag in get_weighted_text_embeddings, so we can reuse lpw implementation in StableDiffusionPipeline

noskill avatar Oct 02 '23 18:10 noskill

@DN6 @yijinsheng do i need to do anything in this PR? Adding lora and hypernetworks as suggested above is a nice feature but this can be done in follow up PRs..

noskill avatar Oct 31 '23 08:10 noskill

@DN6 @yiyixuxu can you give this a look?

patrickvonplaten avatar Nov 01 '23 20:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 26 '23 15:11 github-actions[bot]

Not stale.

sayakpaul avatar Nov 27 '23 02:11 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 26 '23 15:12 github-actions[bot]

Should not be stale?

Scorpinaus avatar Feb 25 '24 23:02 Scorpinaus

@yiyixuxu a gentle bump here.

sayakpaul avatar Feb 26 '24 02:02 sayakpaul

@sayakpaul Maybe we could introduce a PromptEncoder class to pass it into pipeline constructor, or maybe a mixin class?

Using types.MethodType is not the best possible way to change prompt encoding.. but i believe this PR is a step in the right direction

noskill avatar Mar 14 '24 07:03 noskill

Requesting some inputs from @yiyixuxu here.

sayakpaul avatar Mar 14 '24 07:03 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 07 '24 15:04 github-actions[bot]

Hi @noskill How did you envision using this in the Pipelines? By just replacing the call to encode_prompt with the lpw version of encode_prompt?

DN6 avatar Apr 08 '24 04:04 DN6