diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

adds the pipeline for pixart alpha controlnet

Open raulc0399 opened this issue 1 year ago • 24 comments

this PR adds the controlnet pipeline for the pixart alpha diffusion model

the following example uses the HED edge to control the generation.

import torch
import torchvision.transforms as T
import torchvision.transforms.functional as TF

from diffusers.models import PixArtControlNetAdapterModel
from diffusers.pipelines import PixArtAlphaControlnetPipeline, get_closest_hw
import PIL.Image as Image

from controlnet_aux import HEDdetector

input_image_path = "asset/images/controlnet/car.jpg"
given_image = Image.open(input_image_path)

path_to_controlnet = "raulc0399/pixart-alpha-hed-controlnet"
prompt = "modern car, city in background, clear sky, suny day"

weight_dtype = torch.float16
image_size = 1024

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

controlnet = PixArtControlNetAdapterModel.from_pretrained(
    path_to_controlnet,
    torch_dtype=weight_dtype,
    use_safetensors=True,
).to(device)

pipe = PixArtAlphaControlnetPipeline.from_pretrained(
    "PixArt-alpha/PixArt-XL-2-1024-MS",
    controlnet=controlnet,
    torch_dtype=weight_dtype,
    use_safetensors=True,
).to(device)

# preprocess image, generate HED edge
hed = HEDdetector.from_pretrained("lllyasviel/Annotators")

width, height = get_closest_hw(given_image.size[0], given_image.size[1], image_size)

condition_transform = T.Compose([
    T.Lambda(lambda img: img.convert('RGB')),
    T.Resize(int(min(height, width))),
    T.CenterCrop([int(height), int(width)]),
    T.ToTensor()
])

control_image = condition_transform(control_image)
hed_edge = hed(control_image, detect_resolution=image_size, image_resolution=image_size)

with torch.no_grad():
    out = pipe(
        prompt=prompt,
        image=hed_edge,
        num_inference_steps=14,
        guidance_scale=4.5,
        height=image_size,
        width=image_size,
    )

    out.images[0].save(f"./output.jpg")

here some images: original image, control image and generated image

Who can review?

@yiyixuxu @lawrence-cj

raulc0399 avatar Jul 12 '24 20:07 raulc0399

is this the checkpoint? https://huggingface.co/PixArt-alpha/PixArt-ControlNet I don't see any downloads, not sure if it's tracking correctly

is this pixart alpha controlnet used a lot in the community? if not, maybe we can make a community pipeline to start with?

also cc @asomoza

yiyixuxu avatar Jul 17 '24 01:07 yiyixuxu

@yiyixuxu that is the pixart controlnet model for HED conditioning as uploaded by the authors of pixart. for this pipeline i have converted the controlnet layers to safetensors, uploaded here https://huggingface.co/raulc0399/pixart-alpha-hed-controlnet

they can be used with this pipeline

raulc0399 avatar Jul 17 '24 06:07 raulc0399

why does it have its own implementation of the HED detector? It doesn't work with the regular one that everyone uses? Have you tested it with the one from the controlnet_aux library?

asomoza avatar Jul 17 '24 10:07 asomoza

@asomoza

why does it have its own implementation of the HED detector? It doesn't work with the regular one that everyone uses? Have you tested it with the one from the controlnet_aux library?

the sample above just used the HED class that the authors had in their repository, and that was used to train their HED controlnet.

but i just checked it it seems to be the same, or better said adapted, from the controlnet_aux

raulc0399 avatar Jul 17 '24 13:07 raulc0399

thanks, I'll give it a test later. I was asking because if it was trained with a custom HED detector which produces different results than the default one it will be really hard for people to use it.

It would be nice if you could post some results (images) in the PR description.

asomoza avatar Jul 17 '24 14:07 asomoza

thanks, I'll give it a test later. I was asking because if it was trained with a custom HED detector which produces different results than the default one it will be really hard for people to use it.

using the HED from control_aux it looses some quality. will try some more tests with that one.

but i also have a training script that i am testing before creating a PR: https://github.com/raulc0399/PixArt-alpha/blob/master_train_controlnet_diffusers/controlnet/train_pixart_controlnet_hf.py

that can be used to train further models.

It would be nice if you could post some results (images) in the PR description.

will do.

raulc0399 avatar Jul 17 '24 16:07 raulc0399

i have to correct my previous comment. i was using the default params for HED, which converted the image to 512, if i use however 1024 it works as it should.

raulc0399 avatar Jul 17 '24 16:07 raulc0399

Thanks, the results looks nice, since we only have one controlnet, maybe do what @yiyixuxu suggested, lets start with a community pipeline first and then as it gets traction and we have more controlnets move it to core.

asomoza avatar Jul 18 '24 04:07 asomoza

@asomoza

Thanks, the results looks nice, since we only have one controlnet, maybe do what @yiyixuxu suggested, lets start with a community pipeline first and then as it gets traction and we have more controlnets move it to core.

ok, i will move it to the examples folder and put there the training script as well. i have done some initial tests on the "fusing/fill50k" dataset to validate it works

raulc0399 avatar Jul 18 '24 06:07 raulc0399

@yiyixuxu @asomoza have moved all to the examples folder i have also added the training script. together with sh files for starting the training and for running the pipeline

raulc0399 avatar Jul 22 '24 17:07 raulc0399

@yiyixuxu the last commit moves the pipeline and the example on how to run it to examples/community

raulc0399 avatar Jul 23 '24 09:07 raulc0399

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]

@sayakpaul can you take a look to see if we can merge this now?

yiyixuxu avatar Sep 17 '24 20:09 yiyixuxu

@raulc0399 can you run make style?

yiyixuxu avatar Sep 17 '24 20:09 yiyixuxu

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@raulc0399 sorry about the delay here. But I think we need to have the ControlNet model and pipeline implementations under example/community/research_projects as suggested in https://github.com/huggingface/diffusers/pull/8857#discussion_r1687219351

sayakpaul avatar Sep 18 '24 03:09 sayakpaul

@raulc0399 sorry about the delay here. But I think we need to have the ControlNet model and pipeline implementations under example/community/research_projects as suggested in #8857 (comment)

@sayakpaul currently the pipeline is in examples/community/pipeline_pixart_alpha_controlnet.py with the controlnet blocks and training script in examples/community/pixart - i thought this was requested in the link you quote.

so all is under examples/community

is it ok like this? or should i move it under reasearch_projects?

raulc0399 avatar Sep 18 '24 18:09 raulc0399

I think putting everything under research_projects could make more sense here. We could name that to be "pixart_controlnet". This way the ControlNet model, pipeline, and the training script could live under one directory.

sayakpaul avatar Sep 19 '24 02:09 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 13 '24 15:10 github-actions[bot]

gentle pin @raulc0399 - are we interested in moving this to research folder?

yiyixuxu avatar Oct 15 '24 03:10 yiyixuxu

@yiyixuxu yes will do, sorry for the delay.

raulc0399 avatar Oct 15 '24 14:10 raulc0399

@yiyixuxu i have moved the pipeline and training script under research_projects/pixart

raulc0399 avatar Oct 16 '24 17:10 raulc0399

@sayakpaul does this look good to you now?

yiyixuxu avatar Oct 17 '24 22:10 yiyixuxu

May I request for the permission to push commit to raulc0399:main_pixart_alpha_controlnet, so that I can help do the make style and make quality. @raulc0399

lawrence-cj avatar Oct 25 '24 09:10 lawrence-cj

May I request for the permission to push commit to raulc0399:main_pixart_alpha_controlnet, so that I can help do the make style and make quality. @raulc0399

@lawrence-cj sure

raulc0399 avatar Oct 27 '24 07:10 raulc0399

ERROR: Permission to raulc0399/diffusers.git denied to lawrence-cj. Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.

Seems I still cannot push commit to your branch.

lawrence-cj avatar Oct 27 '24 08:10 lawrence-cj

@lawrence-cj i invited you just now as collaborator

raulc0399 avatar Oct 27 '24 10:10 raulc0399

Cool. @raulc0399. already run make style && make quality. Gentle ping yiyi @yiyixuxu .

lawrence-cj avatar Oct 28 '24 06:10 lawrence-cj

thank you @lawrence-cj @raulc0399

yiyixuxu avatar Oct 28 '24 18:10 yiyixuxu

Thank you so much. Respect. @raulc0399 @sayakpaul @yiyixuxu

lawrence-cj avatar Oct 28 '24 18:10 lawrence-cj