RealCompo Results of T2I-Compbench

Hello, I'm interested in your perfect work. And I evaluated your method on the T2I-Compbench. The results are far from what you showed in the paper. I wonder if something has gone wrong?

Here are the implementation details:

First, I got the layout from GPT4.
I evaluated it in the color_val.txt of T2I-Compbench, which contains 300 prompts. (using BLIP-vqa method and --np_num 8 by default)
I only got. 39.84 attributes, but the result is 93 in your paper.

Could you please offer the layout file you use for T2I-Compbench? Or could you please tell me if something is wrong?

Mar 12 '24 11:03 AdventureStory

Hello, I'm interested in your perfect work. And I evaluated your method on the T2I-Compbench. The results are far from what you showed in the paper. I wonder if something has gone wrong?

Here are the implementation details:

First, I got the layout from GPT4.

I evaluated it in the color_val.txt of T2I-Compbench, which contains 300 prompts. (using BLIP-vqa method and --np_num 8 by default)

I only got. 39.84 attributes, but the result is 93 in your paper.

Could you please offer the layout file you use for T2I-Compbench? Or could you please tell me if something is wrong?

Thank you for your interest. Can you please save the images generated by you in a cloud storage and share it with us so that we can test it again?

Mar 12 '24 12:03 Cominclip

Thanks very much! Please wait a moment.

Mar 12 '24 12:03 AdventureStory

https://drive.google.com/file/d/1IEi-aQ_WkpP7SQcLeQiuOgxHE68hzh1A/view?usp=drive_link Here are generated images. Could you download and open it successfully? This zip file including some sub files:

raw_image/: including images that evaluated on color of T2I-Compbench, vis_layout/: including images with bbox layout for visualization annotation_blip/: including evaluation details of color of T2I-Compbench

Mar 12 '24 12:03 AdventureStory

The link indicates that I need to have access permission. I have already requested permission from you using my email. If you are unable to grant access, you can send me a compressed file to my email: [email protected]

Mar 12 '24 12:03 Cominclip

I have granted the access. Thanks!

Mar 12 '24 13:03 AdventureStory

Thank you for your question. Due to the particularity of this benchmark, different random seeds have a significant impact on the test results for generating images. Based on the original code of T2I-Compbench, when dealing with complex prompts, ten images are required for each prompt. We believe that this approach should also be applied to simple prompts. Therefore, for these results, we conducted 10 repeated experiments for each prompt. You can change the seed for each prompt to conduct multiple experiments and obtain results.

Mar 12 '24 15:03 Cominclip

Thank you for your question. Due to the particularity of this benchmark, different random seeds have a significant impact on the test results for generating images. Based on the original code of T2I-Compbench, when dealing with complex prompts, ten images are required for each prompt. We believe that this approach should also be applied to simple prompts. Therefore, for these results, we conducted 10 repeated experiments for each prompt. You can change the seed for each prompt to conduct multiple experiments and obtain results.

Thanks for your answer! Do you mean I need to generate 10 images for each prompts using RealComp with different random seeds?

Mar 12 '24 15:03 AdventureStory

Thank you for your question. Due to the particularity of this benchmark, different random seeds have a significant impact on the test results for generating images. Based on the original code of T2I-Compbench, when dealing with complex prompts, ten images are required for each prompt. We believe that this approach should also be applied to simple prompts. Therefore, for these results, we conducted 10 repeated experiments for each prompt. You can change the seed for each prompt to conduct multiple experiments and obtain results.

Thanks for your answer! Do you mean I need to generate 10 images for each prompts using RealComp with different random seeds?

Yes, that's right.

Mar 13 '24 03:03 Cominclip

Hi @Cominclip ,

I was testing the model on the benchmark and found the same discrepancy as @AdventureStory. I tried different seeds based on your last comment and the results are still the same. The numbers quoted in the paper for Color category is 0.774 My results are: Used GPT-4 to generate layouts.

Seed	Score
0	0.451
117	0.383
393	0.348
423	0.434
486	0.391
700	0.404
717	0.360

With the average of 7 runs being 0.395 which is way off the presented number.

Maybe the authors can share their images or the approach they used to calculate these numbers

Aug 08 '24 21:08 alphacoder01