Feature: Dr. GRPO Support

Open giladfrid009 opened this issue 5 months ago • 1 comments

It would be nice if art would support the Dr. GRPO optimizer. Basically, Dr. GRPO is supposed to remove the response length bias which is present in the original GRPO, while retaining the performance benefits of GRPO. See Dr. GRPO Paper for more info.

Important Note: trl's GRPOTrainer already supports this by setting loss_type="dr_grpo" in GRPOConfig.

I believe this technique might be beneficial, especially for multi-turn RL scenarios, in which token length increase is more significant.

Aug 25 '25 21:08 giladfrid009

I might be wrong on this, but like you mentioned, GRPOTrainer supports DR GRPO and you can run this with art:

trainer_args=art.dev.TrainerArgs(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        loss_type='dr_grpo'
    )

Nov 27 '25 16:11 sidney-tio