Open LLaVA-Video-R1

The current open-source code related to multimodal Deepseek-R1/GRPO is predominantly based on Qwen2VL. However, in the field of video understanding, LLaVA-Video, which serves as one of the most important baselines, still does not have any related open-source projects available (as of 2025/03/18). Therefore, we try to fill this gap by releasing a codebase, Open-LLaVA-Video-R1.

News

[2025/03/19] We release the codebase of Open LLaVA-Video-R1

What we did

To our best knowledge, we are the first to adapt R1/GRPO to LLaVA-Video architecture. In detail, we train LLaVA-Video using GRPO with accuracy and format rewards on the DVD-counting dataset. Training the 7B model on dvd datasets can be completed in approximately 5.5 hours using 8 x A800 (80G) GPUs. The training curve is as follows:

Performance

The experiment settting is the same as Qwen-based Video-R1, validated on the DVD-counting task. As shown in the Table, 11.5% gain is observed after using grpo training on LLaVA-Video-Qwen.

Dataset	LLaVA-Video-7B	LLaVA-Video-7B+GRPO
DVD-counting-test	20.5	32.0 (11.5↑)

Set up

git clone https://github.com/Hui-design/Open-LLaVA-Video-R1.git
cd Open-LLaVA-Video-R1

Our environment is basically the same as Open-r1-video and r1-video. If you have already installed them, you can directly use the previous environment. If you haven't installed them yet, you can try the following commands.

conda create -n LLaVA-Video-R1 python=3.10
conda activate LLaVA-Video-R1
pip3 install -e ".[dev]"
pip3 install flash_attn --no-build-isolation

Dataset

We use the same task as r1-video, using the DVD-counting dataset.

Our dataset organization is:

dvd_dataset
  - dvd
    - *.mp4
  - train_dvd.jsonl
  - test_dvd.jsonl

GRPO on LLaVA-Video

First download LLaVA-Video-Qwen, and modify the model_name_or_path in the train_llava_video.sh

# to run GRPO on llava_video
bash train_llava_video.sh

Evaluation

Evaluation on video counting task

python llava_video_inference.py

Citiation

@misc{Tang2025LlavaVideoR1,
  author       = {Canhui Tang},
  title        = {Open LLaVA-Video-R1},
  howpublished = {\url{https://github.com/Hui-design/Open-LLaVA-Video-R1}},
  note         = {Accessed: 2025-03-18},
  year         = {2025}
}

Acknowledgements

We sincerely appreciate the contributions of the open-source community. The related projects are as follows:

Open-LLaVA-Video-R1
Open-LLaVA-Video-R1 copied to clipboard

Metadata

Open LLaVA-Video-R1

News

What we did

Performance

Set up

Dataset

GRPO on LLaVA-Video

Evaluation

Citiation

Acknowledgements

← Metadata

Owner

Metadata

Open-LLaVA-Video-R1 Open-LLaVA-Video-R1 copied to clipboard

Metadata

Open LLaVA-Video-R1

News

What we did

Performance

Set up

Dataset

GRPO on LLaVA-Video

Evaluation

Citiation

Acknowledgements

← Metadata

Owner

Metadata

Open-LLaVA-Video-R1
Open-LLaVA-Video-R1 copied to clipboard