ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Questions about the Sharing of optimizer states

Open tongping opened this issue 3 years ago • 2 comments

I noticed that ColossalAI provides few optimizers, such as 'FusedLAMB', 'FusedAdam', 'FusedSGD', 'Lamb', 'Lars', 'CPUAdam', 'HybridAdam'. These optimizers shards optimizer states based on the size of parameters and gradients. My question is that if we are not using these optimizers provided by ColossalAI, should I rewrite the optimizer to shard optimizer states? Or I don't need to do that, as long as the parameters and gradients are already sharded, then optimizer states will be sharded automatically?

I also saw a comment in the code: “Inner optimizer must support optimizing hybrid (CPU and CUDA) tensors, and it must set num_fp32_shards_per_param correctly“? I feel that this requirement is needed only if we want to change tensor_placement_policy between "cpu"/"auto" and "cuda"? Otherwise, we don't need the optimizer to support hybrid tensors?

tongping avatar Nov 15 '22 16:11 tongping

  1. Yes, optimizer states will be sharded automatically.
  2. If tensor_placement_policy is "cpu" or "cuda", we don't need the optimizer to support hybrid tensors.

ver217 avatar Nov 16 '22 02:11 ver217

Great. Thank you very much!

tongping avatar Nov 17 '22 05:11 tongping