Quentin Anthony
Quentin Anthony
Currently, the DeepSpeed comm. backend `deepspeed.comm` from https://github.com/microsoft/DeepSpeed/pull/1985 is a full wrapper around `torch.distributed`, and is fully compatible with external calls to `torch.distributed`. Please open an issue if you face...
> I think the print for `--gpu` is useful. To reduce the confusion, we can add some clarification to the print message. Alternative: as DDP is recommended to replace DataParallel,...
> It seems that there is still some issue for CPU backend, i try to use this branch to run the cifar example and meet the following issue: > >...
> @Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move? Small stuff like logging format, some more detailed timers, and the forward hooks functionality in deeperspeed....
> > > @Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move? > > > > > > Small stuff like logging format, some more detailed...
> Who would do the selling though? Us to the DeepSpeed team. I'm saying it would be difficult to convince them that these timers are needed when they already have...
Yeah this should be fixed by https://github.com/EleutherAI/gpt-neox/pull/835
We can just remove the `cpu_offload` flags in the configs. DS has moved to a `offload_optimizer` dict (https://www.deepspeed.ai/docs/config-json/#optimizer-offloading) instead of a single `bool` anyway. I'll create a PR.
@hyunwoongko -- Would you like to restart this effort?
@hyunwoongko -- Would you like to restart this effort?