VLMEvalKit icon indicating copy to clipboard operation
VLMEvalKit copied to clipboard

AI2d gpt和claude3.5官方分数非常高

Open Violettttee opened this issue 1 year ago • 3 comments

您好~ 想请问下你们对于openai和claude3.5在ai2d上特别高的分数有任何建议和想法吗?我这边修改姿势和prompt(添加cot)评测了gpt多次,都无法复现出0.942的超高分数。(加了cot后的最高分也就0.83),想请问你们对于这个gap有什么想法?(我看你们这边的ai2d的评测分数也没有任何高于0.9以上的,很好奇claude和gpt是怎么测出来将近满分的

Violettttee avatar Nov 05 '24 01:11 Violettttee

Hi, @Violettttee , You can try the AI2D_TEST_NO_MASK dataset we provided, which generally display better performance compared to AI2D_TEST due to the different setting. However, we still cannot reproduce the numbers reported by OpenAI or Anthropic.

kennymckormick avatar Nov 05 '24 12:11 kennymckormick

Hi, @Violettttee , You can try the AI2D_TEST_NO_MASK dataset we provided, which generally display better performance compared to AI2D_TEST due to the different setting. However, we still cannot reproduce the numbers reported by OpenAI or Anthropic.

Hi @kennymckormick , could you please provide the link to download AI2D_TEST_NO_MASK dataset. I can not find it. Thanks a lot!

Super-Shen avatar Dec 18 '24 01:12 Super-Shen

The download link is https://opencompass.openxlab.space/utils/VLMEval/AI2D_TEST_NO_MASK.tsv, it is defined in image_mcq.py

Hi, @Violettttee , You can try the AI2D_TEST_NO_MASK dataset we provided, which generally display better performance compared to AI2D_TEST due to the different setting. However, we still cannot reproduce the numbers reported by OpenAI or Anthropic.

Hi @kennymckormick , could you please provide the link to download AI2D_TEST_NO_MASK dataset. I can not find it. Thanks a lot!

kennymckormick avatar Dec 18 '24 13:12 kennymckormick