questions on the training configuration

Open WayXG opened this issue 1 year ago • 1 comments

Thanks for the great work!

I have some questions about the training configuration.

For the training batch size, I assume that we will collect rollout_batch_size = 1024 trajectories into the replay buffer and then conduct the training. In the training, the batch size per GPU is micro_train_batch_size, and the global train batch size is train_batch_size so we will have some gradient accumulation steps ?

In the meantime, I am wondering if there is any choice that I can save the intermediate actor model only without the critic model and the optimizer state.

Many thanks in advance!

Sep 30 '24 16:09 WayXG

so we will have some gradient accumulation steps Yes intermediate actor model You can load the ckpt using OpenRLHF and call strategy.save_model to convert it to HF models

Sep 30 '24 23:09 hijkzzz