The code for evaluating open question of MMMU is completely wrong

Open xwwu2015 opened this issue 1 year ago • 1 comments

It turns all open question of MMMU into "A" choice, and set all non-upper case answer to "A", which leads to higher score if the model outputs "A" by accident.

Mar 11 '24 06:03 xwwu2015

Hi, @xwwu2015 , The current implementation is a fast one and the problem you mentioned do occur, lead to not so accurate performance results for open-ended questions. It will not lead to completely wrong results, since it affects less than 5% questions in MMMU. We would like to refactor the implementation in the future, you are also welcomed to create a PR if you need this feature in VLMEvalKit urgently.

Mar 12 '24 10:03 kennymckormick