multimodel-large-language-model topic
echoOLlama
🦙 echoOLlama: A real-time voice AI platform powered by local LLMs. Features WebSocket streaming, voice interactions, and OpenAI API compatibility. Built with FastAPI, Redis, and PostgreSQL. Perfect f...
TVC
[ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.
RoboBrain2.0
RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉
UI-Venus
UI-Venus is a native UI agent designed to perform precise GUI element grounding and effective navigation using only screenshots as input.
Seg-Zero
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Robust-R1
🔥🔥🔥[AAAI 2026 Oral] Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
UME-Search
Toward Universal Multimodal Embedding
Basic-Visual-Language-Model
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖