feat: add MathVista benchmark

Open omkar-334 opened this issue 2 months ago • 1 comments

Multimodal math benchmark, consists of 2 types of questions - free-form and MCQ.
I've separated each type into a different subset.

The benchmark can be evaluated in 2 ways - either provide the problem solution or the problem code.
For now I'm implementing the solution method.

I need to figure out the proper metric for this - I've tried Metrics.expr_gold_metric and Metrics.exact_match but these are not working. Working on this right now.

Nov 22 '25 09:11 omkar-334

hey @omkar-334 !

I need to figure out the proper metric for this - I've tried Metrics.expr_gold_metric and Metrics.exact_match but these are not working. Working on this right now.

Don't worry about this, what's important for new evals like this is the inspect-ai implementation :)

There are examples here and documentation on how to use here, the inspect-ai documentation is here

Nov 24 '25 10:11 NathanHB