table-transformer
table-transformer copied to clipboard
Support Distributed Data Parallel Implementation of DETR
Detr has the option to train on multiple machines with multiple GPUs. Right now the code is only able to train with one GPU.
One Epoch on a Tesla V100 GPU as described in the paper, takes approximately 3h. I tried to parallelize the code using the PyTorch DataParallel wrapper for the model as a quick fix, but could not make it work.
Is there a way to achieve faster training using multiple GPUs right now?
And if not, are you considering implementing something like DistributedDataParallel?
https://pytorch.org/tutorials/intermediate/ddp_tutorial.html