Colin Taylor comments

Results 18 comments of


                                            Colin Taylor

Failure in import: undefined symbol error from Python 3.7 + CUDA113

@chongxiaoc is this resolved for you?

Unified Authoring of Overlapped Optimizers, and per parameter optimizer settings

IMO we should call it "apply_optimizer_in_backward". Fused/non fused is an implementation detail, and whether its done in torch.autograd or requires comms (e.g PT-D) can also be flexible

TorchRec Sharding Composability

@wangkuiyi sorry for the delay :) I think the snippet is a bit confusing, but the core API as landed is shard_embedding_modules https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/shard_embedding_modules.py#L24 This will replace (module swap) the embedding...

Fix topology not properly set when running dlrm_main multinode

thanks @Luo-Liang -> I think this isn't relevant anymore, sorry formissing the PR

use horovod error

@davidxiaozhi I"m not so familiar with horovod, but my understanding is that it does not use pytorch distributed (https://pytorch.org/docs/stable/distributed.html) library and does the collective / p2p comms itself. torchrec is...

use horovod error

closing due to lack of engagement, @davidxiaozhi feel free ot reopen or follow up about horovod integration if you are still itnerested

TorchRec Sharding Composability

This is landed in master, and will be going out in the next stable release

DMP doesn't broadcast DataParallel ShardingType embedding table from the process with rank 0 to all other processes

@henrylhtsang yes, that is where DDP modules are set up (using actual DDP) to make these data_parallel tables call all_reduce to get the correct gradients. Why do you call this...

[pytorch] [compososable] make contract() pickle-able through functools wraps

@pytorchmergebot -g

[pytorch] [compososable] make contract() pickle-able through functools wraps

@pytorchbot --help