Repeated output when training Qwen2.5-VL

Open yxh-y opened this issue 4 months ago • 1 comments

Hi, I used GRPO for training Qwen2.5-VL with search tools on NPU. It produces repeated output on the first round of rollout.

My evironment:
Ascend D910B
python 3.10
transformers                       4.52.4
vllm                                     0.9.1
vllm_ascend                        0.9.1
torch-npu                           2.5.1.post1

Oct 13 '25 02:10 yxh-y

Please provide your script.

Dec 03 '25 09:12 1k77