A failure case occurred when MIGC generate "2 cats and 3 dogs".
When I used MIGC to generate "2 cats and 3 dogs", I found that the first dog below would still look like a "cat". Is there any way to improve this result?
I am using realisticVisionV51_v51VAE.safetensors, here are my parameters:
rompt_final = [['4k, best quality, masterpiece, ultra high res, ultra detailed,a cat,a cat,a dog,a dog,a dog,grass', 'a cat', 'a cat', 'a dog', 'a dog', 'a dog', 'grass']] bboxes = [[[0.078125, 0.09375, 0.390625, 0.359375], [0.515625, 0.09375, 0.859375, 0.359375], [0.078125, 0.515625, 0.34375, 0.90625], [0.421875, 0.515625, 0.671875, 0.921875], [0.71875, 0.484375, 0.953125, 0.921875], [0.015625, 0.015625, 0.984375, 0.96875]]] negative_prompt = 'worst quality, low quality, watermark, text, blurry' seed = 12573842233801288171 seed_everything(seed) image = pipe(prompt_final, bboxes, num_inference_steps=50, guidance_scale=8, MIGCsteps=25, NaiveFuserSteps=25, aug_phase_with_and=False, negative_prompt=negative_prompt).images[0]
And here are the images generated by MIGC:
"cat" and "dog" are two very similar tokens, which can easily lead to attribute leakage during cross-attention. You can increase NaiveFuserSteps to 50 (i.e., consistent with num_inference_steps=50) to avoid attribute leakage in the last 25 steps of sampling.
image = pipe(prompt_final, bboxes, num_inference_steps=50, guidance_scale=8, MIGCsteps=25, NaiveFuserSteps=50, aug_phase_with_and=False, negative_prompt=negative_prompt).images[0]
Here are the results:
I also face the same problem with SD1.4
prompt_final = [['masterpiece, best quality, gray colored cat, white colored fox', 'gray colored cat',
'white colored fox']]
bboxes = [[[0.5625, 0.101875, 0.984375, 0.5275],
[0.171875, 0.109375, 0.46875, 0.515625]]]
@yuntaodu Thank you for your interest in our work. As a result, have you set NaiveFuserSteps to be consistent with num_inference_steps to avoid attribute leakage to the greatest extent?