verl if rollout.n is doubled, will the samples used for training doubled too?

if rollout.n is doubled, will the samples used for training doubled too? And does each forward/backward pass incur higher GPU costs in training because of larger rollout.n?

Feb 01 '25 10:02 StarDewXXX

Yes. Your understanding is correct.

Feb 01 '25 15:02 vermouth1992

Yes. Your understanding is correct.

Hi, thanks for your reply. So if I increase rollout.n for m times, should I decrease ppo_mini_batch_size or ppo_micro_batch_size to 1/m of original value?

Feb 02 '25 00:02 StarDewXXX

Yes. Your understanding is correct.

Hi, thanks for your reply. So if I increase rollout.n for m times, should I decrease ppo_mini_batch_size or ppo_micro_batch_size to 1/m of original value?

I think ppo_mini_batch_size should also multiply by n. ppo_micro_batch_size should not because it only affects the gradient accumulation and increasing ppo_micro_batch_size may cause OOM

Feb 03 '25 02:02 vermouth1992

Yes. Your understanding is correct.

Hi, thanks for your reply. So if I increase rollout.n for m times, should I decrease ppo_mini_batch_size or ppo_micro_batch_size to 1/m of original value?

I think ppo_mini_batch_size should also multiply by n. ppo_micro_batch_size should not because it only affects the gradient accumulation and increasing ppo_micro_batch_size may cause OOM

ppo_micro_batch_size affect gradient accumulation, and what is the role of train_batch_size?

Feb 14 '25 07:02 JiaxingSong718

Yes. Your understanding is correct.

Hi, thanks for your reply. So if I increase rollout.n for m times, should I decrease ppo_mini_batch_size or ppo_micro_batch_size to 1/m of original value?

I think ppo_mini_batch_size should also multiply by n. ppo_micro_batch_size should not because it only affects the gradient accumulation and increasing ppo_micro_batch_size may cause OOM

ppo_micro_batch_size affect gradient accumulation, and what is the role of train_batch_size?

same question

Feb 14 '25 08:02 splitnum11

same question

Feb 20 '25 11:02 asirgogogo

I have the same issue and I want to describe it in more detail. Here are the settings for GRPO in the example codes：

data.train_batch_size=1024 actor_rollout_ref.actor.ppo_mini_batch_size=256 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=80 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=160 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=160 actor_rollout_ref.rollout.n=5

I’d like to know whether data.train_batch_size = 1024 refers to #prompts, or #prompts * actor_rollout_ref.rollout.n. In other words，if batch data is 1024*5 or 1024?
actor_rollout_ref.actor.ppo_mini_batch_size=256, is it refers to #prompts or #prompts * actor_rollout_ref.rollout.n?
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=80 cannot divide the actor_rollout_ref.actor.ppo_mini_batch_size=256 evenly — why is that? Is this a misconfiguration in the example code?
Does the value of actor_rollout_ref.actor.ppo_mini_batch_size=256 and actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu = 80 consider actor_rollout_ref.rollout.n=5？What I mean is that how do I calculate my training steps？Is 10245 / 256 or 1024 / 256？And how do I calculate accumulated steps？Is 256 / 80 or 2565/80？

🤗 I would appreciate it a lot if anyone could help me clarify these questions. 🤗 🌻

May 09 '25 08:05 Zoisaang

same question

Jun 24 '25 13:06 bestbzw