mllms topic

List mllms repositories

GlobalCom2

37
Stars
1
Forks
37
Watchers

[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models

TVC

144
Stars
0
Forks
144
Watchers

[ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.

Awesome-Reasoning-MLLM

61
Stars
4
Forks
61
Watchers

Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1

OS-Agent-Survey

376
Stars
19
Forks
376
Watchers

This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use" (ACL 2025 Oral).

Active-o3

76
Stars
1
Forks
76
Watchers

ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

G2VLM

242
Stars
4
Forks
242
Watchers

G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

General-Level

19
Stars
3
Forks
19
Watchers

On Path to Multimodal Generalist: General-Level and General-Bench

InternNav

595
Stars
67
Forks
595
Watchers

InternRobotics' open platform for building generalized navigation foundation models.

Olympus

428
Stars
72
Forks
428
Watchers

[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

MedTrinity-25M

394
Stars
28
Forks
394
Watchers

[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“