Video-MME
Video-MME copied to clipboard
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
How can randomness be mitigated during the testing of video-mme? Are there any specific hyperparameter settings for generating responses?
May I ask what question template VideoMME was used during testing. For example, is it something like "Q:{question} \nOption: A: ... \nB: ... \nC: ... \nD:... \nAnswer:"
Can I get the metadata to check which data example is under which question sub_type (e.g., object counting)?
In 119-1, options A and B are same, both being 'The shortest man in the world.'. But the answer is B.
Hi, Will you make open-source/can you share the raw evaluations for proprietary models? Best, Orr
Hello, I see in the paper that default MLLM configs were largely used, but frame counts were increased where applicable. Certain models such as LongVA appear to support video contexts...
https://huggingface.co/allenai/Molmo-7B-D-0924
can you give me the code for drawing radar_chart in your paper?
You mentioned that GPT5 and Gemini 2.5 used Video-MME in their release report showing SOTA performance. Can you add them to the leaderboard please?
Dear Authors, Thank you for your excellent work! I want to submit our best result to VideoMME leaderboard and have sent email to [email protected] four days ago but got no...