verl
verl copied to clipboard
Repeated output when training Qwen2.5-VL
Hi, I used GRPO for training Qwen2.5-VL with search tools on NPU. It produces repeated output on the first round of rollout.
My evironment:
Ascend D910B
python 3.10
transformers 4.52.4
vllm 0.9.1
vllm_ascend 0.9.1
torch-npu 2.5.1.post1
Please provide your script.