multimodal-llm topic

List multimodal-llm repositories

vllm-safety-benchmark

63
Stars
2
Forks
Watchers

[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"

Ant-Multi-Modal-Framework

113
Stars
5
Forks
Watchers

Research Code for Multimodal-Cognition Team in Ant Group

MiniGPT-5

845
Stars
52
Forks
Watchers

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

MineDreamer

68
Stars
4
Forks
Watchers

This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "

FireRedASR

1.6k
Stars
138
Forks
1.6k
Watchers

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recogn...

Wings

25
Stars
1
Forks
25
Watchers

The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]

InfiGUI-G1

127
Stars
14
Forks
127
Watchers

[AAAI 2026 Oral] Official repository for InfiGUI-G1. We introduce Adaptive Exploration Policy Optimization (AEPO) to overcome semantic alignment bottlenecks in GUI agents through efficient, guided exp...

nanochat-VLM

24
Stars
2
Forks
24
Watchers

A minimal, hackable Vision-Language Model built on Karpathy’s nanochat — add image understanding and multimodal chat for under $200 in compute.

vllm-awq4-qwen

36
Stars
2
Forks
36
Watchers

vLLM Qwen 3.6-27B (AWQ-INT4) + DFlash speculative decoding on AMD Strix Halo (gfx1151 iGPU, 128 GB UMA, ROCm 7.13). 24.8 t/s single-stream, vision, tool calling, 256K context, OpenAI-compatible, Docke...

Q-HEART

16
Stars
0
Forks
16
Watchers

Q-HEART: ECG Question Answering via Knowledge-Informed Multimodal LLMs (ECAI 2025)