DeepSpeed
DeepSpeed copied to clipboard
estimate_zero2_model_states_mem_needs: fixing memory estiamtion
was considering 4 bytes per model param, and 4 bytes per gradient. fixed it to 2 bytes - under the assumption of FP16/BF16
@tjruwase @stas00 - What do you say about the updated patch?
@tjruwase @stas00 - What do you say about the updated patch?
@nelyahu, LGTM. Thanks!
@tjruwase @stas00 can you please re-run validation? the failure in "cpu-torch-latest" does not seem related
@tjruwase can be merged?