diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Train GLIGEN in diffusers?

Open Strike1999 opened this issue 1 year ago • 10 comments

Thanks for your great job!!!

Now, I know how to infer GLIGEN with diffusershttps://github.com/gligen/diffusers/tree/gligen/examples/gligen. But how can I train GLIGEN with diffusers like ControlNethttps://github.com/huggingface/diffusers/blob/main/examples/controlnet/train_controlnet.py?

Thanks again.

Strike1999 avatar Mar 18 '24 06:03 Strike1999

We don't have bandwidth to work on the training scripts for GLIGEN. Can open it up to the community.

sayakpaul avatar Mar 18 '24 09:03 sayakpaul

Hi @sayakpaul I have tried the code in url

boxes = [[0.4, 0.2, 1.0, 0.8], [0.0, 1.0, 0.0, 1.0]]  # Set `[0.0, 1.0, 0.0, 1.0]` for the style

The bounding boxes should be in the format of [xmin, ymin, xmax, ymax]. I am confused about this point. I think the right box for the style may be [0, 0, 1, 1].

Hzzone avatar Mar 27 '24 05:03 Hzzone

Cc: @tuanh123789 could you help?

sayakpaul avatar Mar 27 '24 05:03 sayakpaul

Cc: @tuanh123789 could you help?

Ok I'll check

tuanh123789 avatar Mar 27 '24 06:03 tuanh123789

@Hzzone In origin Gligen repo, the author using [xmin, ymin, xmax, ymax] as [x0, y0, x1, y1]. When using style, they pass [0.0, 1.0, 0.0, 1.0] reference image location. So the Gligen implement in Diffusers is the same

tuanh123789 avatar Mar 27 '24 06:03 tuanh123789

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 20 '24 15:04 github-actions[bot]

Hi @sayakpaul I have successfully trained GLIGEN like ControlNet. Could I make a contribution with respect to this issue?

Hzzone avatar Apr 30 '24 05:04 Hzzone

Thanks for your interest and for working on this. Out of curiosity, could we see some results you're getting with your trained model?

In any case, feel free to open a PR adding a training script to https://github.com/huggingface/diffusers/tree/main/examples/research_projects/.

sayakpaul avatar Apr 30 '24 13:04 sayakpaul

I trained the model on COCO dataset with 100k iterations, 64 batch size, using GroundingDINO and BLIP2 to label instances. Prompt:

prompt = 'A realistic image of landscape scene depicting a green car parking on the left of a blue truck, with a red air balloon and a bird in the sky'
gen_boxes = [('a green car', [21, 281, 211, 159]), ('a blue truck', [269, 283, 209, 160]), ('a red air balloon', [66, 8, 145, 135]), ('a bird', [296, 42, 143, 100])]

# prompt = 'A realistic top-down view of a wooden table with two apples on it'
# gen_boxes = [('a wooden table', [20, 148, 472, 216]), ('an apple', [150, 226, 100, 100]), ('an apple', [280, 226, 100, 100])]

# prompt = 'A realistic scene of three skiers standing in a line on the snow near a palm tree'
# gen_boxes = [('a skier', [5, 152, 139, 168]), ('a skier', [278, 192, 121, 158]), ('a skier', [148, 173, 124, 155]), ('a palm tree', [404, 105, 103, 251])]

# prompt = 'An oil painting of a pink dolphin jumping on the left of a steam boat on the sea'
# gen_boxes = [('a steam boat', [232, 225, 257, 149]), ('a jumping pink dolphin', [21, 249, 189, 123])]

import numpy as np

boxes = np.array([x[1] for x in gen_boxes])
boxes = boxes / 512
boxes[:, 2] = boxes[:, 0] + boxes[:, 2]
boxes[:, 3] = boxes[:, 1] + boxes[:, 3]
boxes = boxes.tolist()
gligen_phrases = [x[0] for x in gen_boxes]

Here are the results: image

And the results of the same prompt produced by pretrained GLIGEN model: image

I have also tried the training data provided by GLIGEN, and achieved similar results with 500k iterations. It seems that this model is inferior to the model trained on COCO. Unfortunately, I have not quantitatively evaluated the model.

Hzzone avatar May 01 '24 04:05 Hzzone

Wow, those are very good results. Please feel to start the contribution.

sayakpaul avatar May 01 '24 04:05 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]