verl icon indicating copy to clipboard operation
verl copied to clipboard

if rollout.n is doubled, will the samples used for training doubled too?

Open StarDewXXX opened this issue 1 year ago • 6 comments

if rollout.n is doubled, will the samples used for training doubled too? And does each forward/backward pass incur higher GPU costs in training because of larger rollout.n?

StarDewXXX avatar Feb 01 '25 10:02 StarDewXXX

Yes. Your understanding is correct.

vermouth1992 avatar Feb 01 '25 15:02 vermouth1992

Yes. Your understanding is correct.

Hi, thanks for your reply. So if I increase rollout.n for m times, should I decrease ppo_mini_batch_size or ppo_micro_batch_size to 1/m of original value?

StarDewXXX avatar Feb 02 '25 00:02 StarDewXXX

Yes. Your understanding is correct.

Hi, thanks for your reply. So if I increase rollout.n for m times, should I decrease ppo_mini_batch_size or ppo_micro_batch_size to 1/m of original value?

I think ppo_mini_batch_size should also multiply by n. ppo_micro_batch_size should not because it only affects the gradient accumulation and increasing ppo_micro_batch_size may cause OOM

vermouth1992 avatar Feb 03 '25 02:02 vermouth1992

Yes. Your understanding is correct.

Hi, thanks for your reply. So if I increase rollout.n for m times, should I decrease ppo_mini_batch_size or ppo_micro_batch_size to 1/m of original value?

I think ppo_mini_batch_size should also multiply by n. ppo_micro_batch_size should not because it only affects the gradient accumulation and increasing ppo_micro_batch_size may cause OOM

ppo_micro_batch_size affect gradient accumulation, and what is the role of train_batch_size?

JiaxingSong718 avatar Feb 14 '25 07:02 JiaxingSong718

Yes. Your understanding is correct.

Hi, thanks for your reply. So if I increase rollout.n for m times, should I decrease ppo_mini_batch_size or ppo_micro_batch_size to 1/m of original value?

I think ppo_mini_batch_size should also multiply by n. ppo_micro_batch_size should not because it only affects the gradient accumulation and increasing ppo_micro_batch_size may cause OOM

ppo_micro_batch_size affect gradient accumulation, and what is the role of train_batch_size?

same question

splitnum11 avatar Feb 14 '25 08:02 splitnum11

same question

asirgogogo avatar Feb 20 '25 11:02 asirgogogo

I have the same issue and I want to describe it in more detail. Here are the settings for GRPO in the example codes:

data.train_batch_size=1024 actor_rollout_ref.actor.ppo_mini_batch_size=256 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=80 actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=160 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=160 actor_rollout_ref.rollout.n=5

  1. I’d like to know whether data.train_batch_size = 1024 refers to #prompts, or #prompts * actor_rollout_ref.rollout.n. In other words,if batch data is 1024*5 or 1024?

  2. actor_rollout_ref.actor.ppo_mini_batch_size=256, is it refers to #prompts or #prompts * actor_rollout_ref.rollout.n?

  3. actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=80 cannot divide the actor_rollout_ref.actor.ppo_mini_batch_size=256 evenly — why is that? Is this a misconfiguration in the example code?

  4. Does the value of actor_rollout_ref.actor.ppo_mini_batch_size=256 and actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu = 80 consider actor_rollout_ref.rollout.n=5?What I mean is that how do I calculate my training steps?Is 10245 / 256 or 1024 / 256?And how do I calculate accumulated steps?Is 256 / 80 or 2565/80?

🤗 I would appreciate it a lot if anyone could help me clarify these questions. 🤗 🌻

Zoisaang avatar May 09 '25 08:05 Zoisaang

same question

bestbzw avatar Jun 24 '25 13:06 bestbzw