Quentin Anthony comments

Results 86 comments of


                                            Quentin Anthony

[BUG]DeepSpeed Comm. Backend not compatible with outside torch.distributed module

Currently, the DeepSpeed comm. backend `deepspeed.comm` from https://github.com/microsoft/DeepSpeed/pull/1985 is a full wrapper around `torch.distributed`, and is fully compatible with external calls to `torch.distributed`. Please open an issue if you face...

Add distributed usage to ImageNet example's README

> I think the print for `--gpu` is useful. To reduce the confusion, we can add some clarification to the print message. Alternative: as DDP is recommended to replace DataParallel,...

DeepSpeed CPU Support

> It seems that there is still some issue for CPU backend, i try to use this branch to run the cifar example and meet the following issue: > >...

Latest DeepSpeed Support

> @Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move? Small stuff like logging format, some more detailed timers, and the forward hooks functionality in deeperspeed....

Latest DeepSpeed Support

> > > @Quentin-Anthony Can you list which DeeperSpeed features would be lost with this move? > > > > > > Small stuff like logging format, some more detailed...

Latest DeepSpeed Support

> Who would do the selling though? Us to the DeepSpeed team. I'm saying it would be difficult to convince them that these timers are needed when they already have...

Negative document indices caused by 64 bit integer stored in a 32 bit integer array.

Yeah this should be fixed by https://github.com/EleutherAI/gpt-neox/pull/835

`cpu_offload` is depreciated

We can just remove the `cpu_offload` flags in the configs. DS has moved to a `offload_optimizer` dict (https://www.deepspeed.ai/docs/config-json/#optimizer-offloading) instead of a single `bool` anyway. I'll create a PR.

Introduce improvements from OSLO

@hyunwoongko -- Would you like to restart this effort?

Migrate tensor parallelism code to use OSLO

@hyunwoongko -- Would you like to restart this effort?