Congmin(Xavier) Qiu
Congmin(Xavier) Qiu
Actually, there is a try catch that avoid training failure, if the tool name not exist, the reward will be 0.0. In this case, the model will be able to...
为啥不一致呢?后面计算 reward_extra_infos_dict 不是按照balance过的batch去计算的吗? ``` # === 配置 === config = { "trainer": { "balance_batch": True, # ✅ 开启 "rollout_data_dir": "./rollout" # ✅ 开启 } } # === Step 0: 原始...
Closed my PR because SPMD will be deprecated https://github.com/volcengine/verl/pull/4106
@wuxibin89 @vermouth1992 can you help review my fix? 🙏