Sparse_SwitchNorm icon indicating copy to clipboard operation
Sparse_SwitchNorm copied to clipboard

Multiple GPUs error

Open ArsenLuca opened this issue 6 years ago • 3 comments

When I use DataParallel for multiple GPUs, it raises an error in the function sparsestmax: RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathCompareT.cu:31 It seems that the rad is only set to gpu0. So how should I re-organize the code to set rad to all GPUs?

ArsenLuca avatar Aug 02 '19 07:08 ArsenLuca

Sometimes DataParallel module in PyTorch shows some random behavior and could be very slow, so we would recommend you to use distributed training instead.

MengTianjian avatar Aug 07 '19 06:08 MengTianjian

We have updated SSN code for supporting dataparallel, please check and try. But we haven't tested the experiement results(ImageNet...) in Dataparallel mode yet.

herbertLJY avatar Aug 12 '19 04:08 herbertLJY

We have updated SSN code for supporting dataparallel, please check and try. But we haven't tested the experiement results(ImageNet...) in Dataparallel mode yet.

Good job. Thank you.

ArsenLuca avatar Aug 12 '19 05:08 ArsenLuca