Vision-SR1 icon indicating copy to clipboard operation
Vision-SR1 copied to clipboard

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

Results 5 Vision-SR1 issues
Sort by recently updated
recently updated
newest added

Thanks for you excellent work! I train RL for 20 steps using your cold-start model, but the fornat reward is always 0.

Run the script with: bash ./validation_examples/2-seethink_format_eval.sh. After executing this shell file, output files (e.g., datasets.jsonl) are generated under validation_responses/3B-Qwen/. However, the next step requires running: python LLM_eval.py --input_dir ./Raw-Outputs/7B-Vision-SR1 (the...

Thanks for your excellent work. When I was evaluating the self_reward_7b model, I found that some datasets seemed to be invisible on huggingface, such as " zli12321/mm-vet", "zli12321/pope" . How...

Hello, thank you for sharing this great project! 🙏 I would like to reproduce and test the naive GRPO baseline (without the second-pass self-reward) under the same environment and settings...

Thanks for your awesome work. Does it support LoRA? If not, could you please provide some basic suggestions, since I don't have GPU of 80GB