MLVU icon indicating copy to clipboard operation
MLVU copied to clipboard

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Results 11 MLVU issues
Sort by recently updated
recently updated
newest added

The data/8_sub_scene.json and data/9_summary.json in this repo are dev or test set in the leaderboard?

Hello, thank you for sharing. I have a question. Why is the dataset given in this repository [MLVU-Dev](https://huggingface.co/datasets/MLVU/MVLU) different from the dataset used by lmms-eval ([sy1998/MLVU_dev](https://huggingface.co/datasets/sy1998/MLVU_dev/tree/main))? Is there any difference...

Thank you for your outstanding work. We noticed that you haven't added our model [VideoChat-Flash](https://github.com/OpenGVLab/VideoChat-Flash) to the leaderboard, which achieved a performance of 74.7 with a 7B scale. We sincerely...

Hi, Will you make open-source/can you share the raw evaluations for proprietary models? Best, Orr

Hello,In the evaluation Leaderboard, the value of 'Input' column is usually 'n frm', '16 frm' for example. How do I understand this value? Is 16 frames sampled from the entire...

When I open http://analysis.a1.luyouxia.net:23226/, it tells: 使用 TCP 映射用于 HTTP 协议访问时,请使用分配的域名加端口进行访问,不支持使用其它域名访问。 (无效主机头: analysis.a1.luyouxia.net)

Hello, in the experiment section, can GPT-4o handle uploading 120 frames at once? Why can I only upload up to 50 frames when I call the API?

Thank you for your outstanding work. We noticed that you haven't added our agent-based method [LVAgent](https://github.com/64327069/LVAgent) to the leaderboard, which achieved a performance of 83.9 with two 72B models and...

I want to test MLVU test G-AVG. How can I evaluate MLVU test G-AVG?

Hi author thank you for sharing such a great work! I have just simple one question. **Can you tell me which model used for generating the test_res.json in the official...