questions on the training configuration
Thanks for the great work!
I have some questions about the training configuration.
For the training batch size, I assume that we will collect rollout_batch_size = 1024 trajectories into the replay buffer and then conduct the training. In the training, the batch size per GPU is micro_train_batch_size, and the global train batch size is train_batch_size so we will have some gradient accumulation steps ?
In the meantime, I am wondering if there is any choice that I can save the intermediate actor model only without the critic model and the optimizer state.
Many thanks in advance!
so we will have some gradient accumulation steps Yes
intermediate actor model You can load the ckpt using OpenRLHF and call strategy.save_model to convert it to HF models