TheDecisionJTree
Results
2
comments of
TheDecisionJTree
My DeepSpeed zero1/2/3 + offload use more gpu memory than DDP
> @TheDecisionJTree Can you share your exact deepspeed config or the ds_pretrain ....sh script ? OK { "optimizer": { "type": "AdamW", "params": { "lr": 0.001, "betas": [ 0.8, 0.999 ],...