Justin Chiu

Results 2 comments of Justin Chiu

we havent been updating this issue, but we were seeing the same issues even after optimizations to stage3 discussed in https://github.com/microsoft/DeepSpeed/issues/1069 and trying to figure out why (optimizations: https://github.com/jfc4050/DeepSpeed/tree/stage3) i...

@tjruwase yes results were with NVMe offload (optimizer states and params). here's the full config ``` ds_config = { "wall_clock_breakdown": False, "train_micro_batch_size_per_gpu": 8, "gradient_accumulation_steps": 1, "gradient_clipping": 1.0, "fp16": {"enabled": True},...