Vision-SR1
Vision-SR1 copied to clipboard
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
Thanks for you excellent work! I train RL for 20 steps using your cold-start model, but the fornat reward is always 0.
Run the script with: bash ./validation_examples/2-seethink_format_eval.sh. After executing this shell file, output files (e.g., datasets.jsonl) are generated under validation_responses/3B-Qwen/. However, the next step requires running: python LLM_eval.py --input_dir ./Raw-Outputs/7B-Vision-SR1 (the...
Thanks for your excellent work. When I was evaluating the self_reward_7b model, I found that some datasets seemed to be invisible on huggingface, such as " zli12321/mm-vet", "zli12321/pope" . How...
Hello, thank you for sharing this great project! 🙏 I would like to reproduce and test the naive GRPO baseline (without the second-pass self-reward) under the same environment and settings...
Thanks for your awesome work. Does it support LoRA? If not, could you please provide some basic suggestions, since I don't have GPU of 80GB