Support Distributed Data Parallel Implementation of DETR

Open Jasperhino opened this issue 4 years ago • 0 comments

Detr has the option to train on multiple machines with multiple GPUs. Right now the code is only able to train with one GPU. One Epoch on a Tesla V100 GPU as described in the paper, takes approximately 3h. I tried to parallelize the code using the PyTorch DataParallel wrapper for the model as a quick fix, but could not make it work.

Is there a way to achieve faster training using multiple GPUs right now? And if not, are you considering implementing something like DistributedDataParallel?

https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

Dec 19 '21 17:12 Jasperhino