vlms topic

List vlms repositories

HallusionBench

228
Stars
5
Forks
Watchers

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

openai-scala-client

240
Stars
36
Forks
240
Watchers

Scala client for OpenAI API and other major LLM providers

ViTamin

210
Stars
6
Forks
210
Watchers

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

CAL

59
Stars
2
Forks
59
Watchers

[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

AWT

70
Stars
1
Forks
Watchers

[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

docext

1.8k
Stars
137
Forks
1.8k
Watchers

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

OmniCaptioner

168
Stars
14
Forks
168
Watchers

Official Repository of OmniCaptioner

InternNav

595
Stars
67
Forks
595
Watchers

InternRobotics' open platform for building generalized navigation foundation models.

FM-AD-Survey

131
Stars
9
Forks
131
Watchers

This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.

SegAgent

88
Stars
2
Forks
88
Watchers

[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories