VLMEvalKit
VLMEvalKit copied to clipboard
The code for evaluating open question of MMMU is completely wrong
It turns all open question of MMMU into "A" choice, and set all non-upper case answer to "A", which leads to higher score if the model outputs "A" by accident.
Hi, @xwwu2015 , The current implementation is a fast one and the problem you mentioned do occur, lead to not so accurate performance results for open-ended questions. It will not lead to completely wrong results, since it affects less than 5% questions in MMMU. We would like to refactor the implementation in the future, you are also welcomed to create a PR if you need this feature in VLMEvalKit urgently.