MIGC icon indicating copy to clipboard operation
MIGC copied to clipboard

A failure case occurred when MIGC generate "2 cats and 3 dogs".

Open saikouMonika opened this issue 1 year ago • 3 comments

When I used MIGC to generate "2 cats and 3 dogs", I found that the first dog below would still look like a "cat". Is there any way to improve this result?

I am using realisticVisionV51_v51VAE.safetensors, here are my parameters:

rompt_final = [['4k, best quality, masterpiece, ultra high res, ultra detailed,a cat,a cat,a dog,a dog,a dog,grass', 'a cat', 'a cat', 'a dog', 'a dog', 'a dog', 'grass']] bboxes = [[[0.078125, 0.09375, 0.390625, 0.359375], [0.515625, 0.09375, 0.859375, 0.359375], [0.078125, 0.515625, 0.34375, 0.90625], [0.421875, 0.515625, 0.671875, 0.921875], [0.71875, 0.484375, 0.953125, 0.921875], [0.015625, 0.015625, 0.984375, 0.96875]]] negative_prompt = 'worst quality, low quality, watermark, text, blurry' seed = 12573842233801288171 seed_everything(seed) image = pipe(prompt_final, bboxes, num_inference_steps=50, guidance_scale=8, MIGCsteps=25, NaiveFuserSteps=25, aug_phase_with_and=False, negative_prompt=negative_prompt).images[0]

And here are the images generated by MIGC:

be2a7472c1fa3ffcb38012ef9edc00c

89818c8daca02bf7ea39fefe4ce2c3b

saikouMonika avatar Mar 16 '24 09:03 saikouMonika

"cat" and "dog" are two very similar tokens, which can easily lead to attribute leakage during cross-attention. You can increase NaiveFuserSteps to 50 (i.e., consistent with num_inference_steps=50) to avoid attribute leakage in the last 25 steps of sampling.

image = pipe(prompt_final, bboxes, num_inference_steps=50, guidance_scale=8, MIGCsteps=25, NaiveFuserSteps=50, aug_phase_with_and=False, negative_prompt=negative_prompt).images[0]

Here are the results: anno_outputv2 outputv2

limuloo avatar Mar 16 '24 09:03 limuloo

I also face the same problem with SD1.4

prompt_final = [['masterpiece, best quality, gray colored cat, white colored fox', 'gray colored cat',
'white colored fox']] bboxes = [[[0.5625, 0.101875, 0.984375, 0.5275],
[0.171875, 0.109375, 0.46875, 0.515625]]]

image

yuntaodu avatar May 06 '24 15:05 yuntaodu

@yuntaodu Thank you for your interest in our work. As a result, have you set NaiveFuserSteps to be consistent with num_inference_steps to avoid attribute leakage to the greatest extent?

limuloo avatar May 07 '24 13:05 limuloo