Async-GRPO: Decouple rollout generation and reward computation/model update for faster training

Open marksverdhei opened this issue 2 months ago • 1 comments

Method description

Im not sure if it is better as a feature request or trainer addition, but i think it would be great if trl supported async-grpo. It is not mathematically equivalent to grpo since the policy updates may or may not lag, but it really doesn't matter. It effectively creates pipeline-parallelism in the GRPO workflow.

I tried to find related issues for this. https://github.com/huggingface/trl/issues/4130 refers to reward functions concurrent wrt each other but doesn't propose the decoupling.

NeMO RL supports this, it is described in the documentation as well. https://docs.nvidia.com/nemo/rl/latest/guides/async-grpo.html

I don't think it is supported in TRL as of yet, based on a brief check.

Cheers!

Open source status

[x] The method implementation is available
[ ] The model weights are available
[x] The training datasets are available

Provide useful links for the implementation

https://docs.nvidia.com/nemo/rl/latest/guides/async-grpo.html https://github.com/NVIDIA-NeMo/RL

Nov 27 '25 12:11 marksverdhei

Indeed, it's not available yet, and at this point, I'm not sure if we will support it in the future. Keeping this issue in case this changes in the future

Dec 11 '25 06:12 qgallouedec