Justin Chiu
Justin Chiu
we havent been updating this issue, but we were seeing the same issues even after optimizations to stage3 discussed in https://github.com/microsoft/DeepSpeed/issues/1069 and trying to figure out why (optimizations: https://github.com/jfc4050/DeepSpeed/tree/stage3) i...
@tjruwase yes results were with NVMe offload (optimizer states and params). here's the full config ``` ds_config = { "wall_clock_breakdown": False, "train_micro_batch_size_per_gpu": 8, "gradient_accumulation_steps": 1, "gradient_clipping": 1.0, "fp16": {"enabled": True},...