feat: add MathVista benchmark
Multimodal math benchmark, consists of 2 types of questions - free-form and MCQ.
I've separated each type into a different subset.
The benchmark can be evaluated in 2 ways - either provide the problem solution or the problem code.
For now I'm implementing the solution method.
I need to figure out the proper metric for this - I've tried Metrics.expr_gold_metric and Metrics.exact_match but these are not working. Working on this right now.
hey @omkar-334 !
I need to figure out the proper metric for this - I've tried Metrics.expr_gold_metric and Metrics.exact_match but these are not working. Working on this right now.
Don't worry about this, what's important for new evals like this is the inspect-ai implementation :)
There are examples here and documentation on how to use here, the inspect-ai documentation is here