ART
ART copied to clipboard
Feature: Dr. GRPO Support
It would be nice if art would support the Dr. GRPO optimizer. Basically, Dr. GRPO is supposed to remove the response length bias which is present in the original GRPO, while retaining the performance benefits of GRPO. See Dr. GRPO Paper for more info.
Important Note: trl's GRPOTrainer already supports this by setting loss_type="dr_grpo" in GRPOConfig.
I believe this technique might be beneficial, especially for multi-turn RL scenarios, in which token length increase is more significant.
I might be wrong on this, but like you mentioned, GRPOTrainer supports DR GRPO and you can run this with art:
trainer_args=art.dev.TrainerArgs(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
loss_type='dr_grpo'
)