move long-prompt weighting code to utils
Long-prompt weighting pipeline can't be used with other pipelines e.g. StableDiffusionKDiffusionPipeline
this PR moves long-prompt weighting code to utils so that long-prompt weighting can be used with any pipeline:
from diffusers import StableDiffusionKDiffusionPipeline
from diffusers import utils
import types
import torch
pipe = StableDiffusionKDiffusionPipeline.from_pretrained(
"frankjoshua/icbinpICantBelieveIts_v8",
torch_dtype=torch.float16
)
pipe.set_scheduler('sample_dpmpp_2m')
pipe=pipe.to("cuda")
# replace encode_prompt in the pipe with long-prompt weighting
pipe.encode_prompt = types.MethodType(utils.encode_prompt, pipe)
Also there is code by @takuma104 https://gist.github.com/takuma104/43552b8ec70b63323c57dc9c6fcb9b90, perhaps there should be a community "utils" or "contrib" module, some code doesn't need it's own pipeline..
this removes the dependency on compel,
I am okay with that. But not super sure on the maintenance part. I think it should be a bit community-driven to start with. WDYT?
it seems like a very generic name for something that might only apply to Stable Diffusion pipelines. could this work for, say DeepFloyd or Kandinsky?
@bghira generated image changes a little depending on weights, but much less than i see with stable diffusion models. I'll test it with DeepFloyd
from diffusers import DiffusionPipeline
from diffusers import utils
import types
import torch
generator = torch.Generator("cuda").manual_seed(3842793274)
pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")
pipe_prior.encode_prompt = types.MethodType(utils.encode_prompt, pipe_prior)
t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
t2i_pipe.to("cuda")
t2i_pipe.encode_prompt = types.MethodType(utils.encode_prompt, t2i_pipe)
prompt = "A alien cheeseburger creature eating itself, claymation, (cinematic:0.7), (moody lighting:0.7)"
negative_prompt = "low quality, bad quality"
image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=6.0, generator=generator).to_tuple()
image = t2i_pipe(
prompt, image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, generator=generator).images[0]
image.save("cheeseburger_monster0.png")
different versions of Kandinsky have different text encoders:
- 2.2 uses OpenCLIP bigG, same as SDXL refiner and one of SDXL base TEs,
- 2.1 used XLM-Roberta-Large-Vit-L-14,
- and 2.0 used two small text encoders
i can imagine each behaving differently or breaking with this.
Kandinsky 2-2 works with lpw similar to 2-1, image changes a little bit
from diffusers import DiffusionPipeline
from diffusers import utils
import functools
import types
import torch
generator = torch.Generator("cuda").manual_seed(3842793274)
pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")
pipe_prior.encode_prompt = types.MethodType(functools.partial(utils.encode_prompt, max_embeddings_multiples=10), pipe_prior)
t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)
t2i_pipe.to("cuda")
t2i_pipe.encode_prompt = types.MethodType(functools.partial(utils.encode_prompt, max_embeddings_multiples=10), t2i_pipe)
prompt = "A alien cheeseburger creature eating itself, claymation, (cinematic:2.7), (moody lighting:2.7)"
negative_prompt = "low quality, bad quality"
image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=6.0, generator=generator).to_tuple()
image = t2i_pipe(image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768, generator=generator).images[0]
image.save("cheeseburger_monster5.png")
the sd-xl pipline need prompt_embed_pool,does this support it
@yijinsheng no
@bghira it is possible to pass embeddings from encode_prompt function to deepfloyd pipeline, but picture quality got much worse:
So from tests it looks like only SD pipelines can benefit from this code so i moved lpw to lpw_stable_diffusion.py
i'll take a look if it is possible to extract common code from stable diffusion and stable diffusion xl lpw pipelines
almost undoubtedly it CAN work on the other models, but likely will require investigation into how to do it.
@yiyixuxu @sayakpaul anybody has time to take over this PR?
@bghira does LPW support kohya style lora prompts <lora:yyyyy:1.2>?
Prompt weighting is supported via compel:
https://huggingface.co/docs/diffusers/using-diffusers/weighted_prompts
Prompt weighting is supported via
compel: https://huggingface.co/docs/diffusers/using-diffusers/weighted_prompts
We can nevertheless support lpw more natively in the diffusers core codebase
@sarmientoj24 it doesn't support lora prompts. I would rather add this and other features in a separate PR.
@sarmientoj24 it doesn't support lora prompts. I would rather add this and other features in a separate PR.
def process_lora_prompt(prompt):
lora = re.compile(r'<lora:([^:]+):([\d\.\-]+)>')
lora_matches = lora.findall(prompt)
lora_matches = [(f'({name}:{weight})', float(weight))
for name, weight in lora_matches]
filtered_prompt = lora.sub('', prompt)
filtered_prompt += ' '.join(f'({name}:{weight})' for name,
weight in lora_matches)
return filtered_prompt
here is function to extract lora from prompt
i don't think that's how you handle LoRA prompting, that merely removes the loras from the prompt.
@adhikjoshi btw there is already skip_weighting flag in get_weighted_text_embeddings, so we can reuse lpw implementation in StableDiffusionPipeline
@DN6 @yijinsheng do i need to do anything in this PR? Adding lora and hypernetworks as suggested above is a nice feature but this can be done in follow up PRs..
@DN6 @yiyixuxu can you give this a look?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Not stale.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Should not be stale?
@yiyixuxu a gentle bump here.
@sayakpaul Maybe we could introduce a PromptEncoder class to pass it into pipeline constructor, or maybe a mixin class?
Using types.MethodType is not the best possible way to change prompt encoding.. but i believe this PR is a step in the right direction
Requesting some inputs from @yiyixuxu here.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi @noskill How did you envision using this in the Pipelines? By just replacing the call to encode_prompt with the lpw version of encode_prompt?