diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[WIP] [LoRA] support omi hidream lora.

Open sayakpaul opened this issue 8 months ago • 5 comments

What does this PR do?

Check https://github.com/huggingface/diffusers/issues/11653.

This PR isn't at all ready. But opening up to discuss some doubts. Currently, this PR is only aimed at supporting the transformer components of the LoRA state dict (other components will be iterated in this PR itself).

I tried with the following code on top of this PR:

Expand
import torch
from transformers import AutoTokenizer, LlamaForCausalLM
from diffusers import HiDreamImagePipeline


text_encoder_4 = LlamaForCausalLM.from_pretrained(
    "terminusresearch/hidream-i1-llama-3.1-8b-instruct",
    subfolder="text_encoder_4",
    output_hidden_states=True,
    output_attentions=True,
    torch_dtype=torch.bfloat16,
).to("cuda", dtype=torch.bfloat16)
tokenizer_4 = AutoTokenizer.from_pretrained(
    "terminusresearch/hidream-i1-llama-3.1-8b-instruct",
    subfolder="tokenizer_4",
)
pipe = HiDreamImagePipeline.from_pretrained(
    "HiDream-ai/HiDream-I1-Dev",
    text_encoder_4=text_encoder_4,
    tokenizer_4 = tokenizer_4,
    torch_dtype=torch.bfloat16,

).to("cuda")
pipe.load_lora_weights(f"RhaegarKhan/OMI_LORA")
image = pipe(
    'A cat holding a sign that says "Hi-Dreams.ai".',
    height=1024,
    width=1024,
    guidance_scale=5.0,
    num_inference_steps=50,
    generator=torch.Generator("cuda").manual_seed(0),
).images[0]
image.save("output.png")

However, it currently leads to this problem and I am not sure what those params correspond and how they should be handled in the first place.

Additionally, the LoRA has: https://pastebin.com/diwEwtsS

image

Could you shed some details @ali-afridi26?

sayakpaul avatar Jun 05 '25 02:06 sayakpaul

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Hi @sayakpaul , thank you for this PR. Judy Hopps seem to be pivotal tuning embed for the trigger word "Judy Hopps" itself. For now you may ignore the pivotal embeds but maybe in a future addition could have a helper to extract and properly add this as a embed with weightings.

ali-afridi26 avatar Jun 05 '25 18:06 ali-afridi26

@ali-afridi26 thanks. I had guessed that to be the case but what about the others as shown in https://pastebin.com/diwEwtsS?

sayakpaul avatar Jun 06 '25 03:06 sayakpaul

s. I had guessed that to be the case but what about the others as shown in h

@sayakpaul . Thanks for asking. Upon further investigation, the weights seem to be the ones used in E1: https://huggingface.co/HiDream-ai/HiDream-E1-Full/blob/43599f36872e2b02384abd041a489[…]84/transformer/diffusion_pytorch_model.safetensors.index.json HiDream E1 has a refiner LoRA maybe it's this one

ali-afridi26 avatar Jun 10 '25 15:06 ali-afridi26

LoRA weights usually have lora_A and lora_B (down and up matrices). But those weights don't correspond accordingly. It would be helpful to know how those weights should be treated during LoRA loading. So, any pointers is appreciated.

HiDream E1 has a refiner LoRA maybe it's this one

Could you point me to the reference that indicates it?

Also, OTOH, diffusers doesn't support E1 yet.

sayakpaul avatar Jun 10 '25 16:06 sayakpaul