mllm-evaluation topic
EASI
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
EgoThink
[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models"
FinMME
[ACL 2025] FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
EgoTextVQA
[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
GAGE
General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.
EIBench
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
core-knowledge
Office codebase for ICML 2025 paper "Core Knowledge Deficits in Multi-Modal Language Models"
General-Level
On Path to Multimodal Generalist: General-Level and General-Bench
VidEgoThink
The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"