Vision-SR1 issues

Why format reward is always zero?

Thanks for you excellent work! I train RL for 20 steps using your cold-start model, but the fornat reward is always 0.

About test script

Run the script with: bash ./validation_examples/2-seethink_format_eval.sh. After executing this shell file, output files (e.g., datasets.jsonl) are generated under validation_responses/3B-Qwen/. However, the next step requires running: python LLM_eval.py --input_dir ./Raw-Outputs/7B-Vision-SR1 (the...

bruno686

Some test datasets are unavailable

6

Thanks for your excellent work. When I was evaluating the self_reward_7b model, I found that some datasets seemed to be invisible on huggingface, such as " zli12321/mm-vet", "zli12321/pope" . How...

kkkkkkon

Request for script to reproduce naive GRPO baseline

1

Hello, thank you for sharing this great project! 🙏 I would like to reproduce and test the naive GRPO baseline (without the second-pass self-reward) under the same environment and settings...

naajeehxe

LoRA support

7

Thanks for your awesome work. Does it support LoRA? If not, could you please provide some basic suggestions, since I don't have GPU of 80GB

ppalantir

Vision-SR1
Vision-SR1 copied to clipboard

Metadata

Why format reward is always zero?

About test script

Some test datasets are unavailable

Request for script to reproduce naive GRPO baseline

LoRA support

← Metadata

Owner

Metadata

Vision-SR1 Vision-SR1 copied to clipboard

Metadata

Why format reward is always zero?

About test script

Some test datasets are unavailable

Request for script to reproduce naive GRPO baseline

LoRA support

← Metadata

Owner

Metadata

Vision-SR1
Vision-SR1 copied to clipboard