Hyperparameters with a greater impact on GPU usage

Open XueruiSu opened this issue 11 months ago • 0 comments

I would like to ask whether ppo_mini_batch_size and ppo_micro_batch_size have a big impact on GPU usage? And in the process of PPO training, which parameters have a greater impact on GPU usage? I tried adjusting max_response_length, train_batch_size and rollout.n, but it didn't seem to have much effect on reducing GPU usage. In fact, the gradient calculation on some special data was still using up the GPU memory.

Mar 05 '25 05:03 XueruiSu