Video-MME issues

How can randomness be avoided?

How can randomness be mitigated during the testing of video-mme? Are there any specific hyperparameter settings for generating responses?

Yanyx44

About the question template

1

May I ask what question template VideoMME was used during testing. For example, is it something like "Q:{question} \nOption: A: ... \nB: ... \nC: ... \nD:... \nAnswer:"

jdsannchao

Metadata

Can I get the metadata to check which data example is under which question sub_type (e.g., object counting)?

HyeonjeongHa

A mistake？

4

In 119-1, options A and B are same, both being 'The shortest man in the world.'. But the answer is B.

HIT-leaderone

Video-MME
Video-MME copied to clipboard

Metadata

How can randomness be avoided?

About the question template

Metadata

A mistake？

raw evaluations for proprietary models

Evaluation frame counts inconsistent

add eval Molmo-7B-D-0924

drawing radar_chart

can you update the leaderboard

Submit result to leaderboard

← Metadata

Owner

Metadata

Video-MME Video-MME copied to clipboard

Metadata

← Metadata

Owner

Metadata

Video-MME
Video-MME copied to clipboard