AdaptSegNet Mutli gpu training

Hi currently training on GTA2Cityscapes takes 2 days for 100k epochs which is very slow. How can I make this run in multi gpu?

Sep 06 '18 06:09 kshitijagrwl

If you do mean 100k EPOCHes, it is not slow dude. Try nn.dataparallel(model) to run on multiple GPU and you can find tutorial here https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

Sep 07 '18 07:09 wppply

Oops, I meant 100k iterations! Thanks for the link, will try out and update.

Sep 07 '18 12:09 kshitijagrwl

I have a image computing PC with cpu(16g) and gpu(8g) ，is it enough to train the model without throwing a CUDA out of Memory error,please?

Oct 24 '18 07:10 SiyuanWei

I forgot details, but cityscapes dataset usually required 11GB GPU based on my experience.

Oct 24 '18 07:10 wppply

thanks anyway ,although it is a bad news

Oct 24 '18 10:10 SiyuanWei

@kshitijagrwl hi, Have you completed the multi-GPU version?

Dec 12 '18 02:12 ypjian

have anyone tried multi-GPU version? I want to train with multi-GPU. please provide the way to train multigpu.

Mar 15 '20 11:03 lerndeep

@lerndeep I'm trying it now, running into a few bugs (fairly new to PyTorch). Will update here if/when I get it working

Mar 21 '20 20:03 lychrel

@kshitijagrwl @lychrel Are you finishing the multi gpu computing? Looking forward to your reply!

Dec 30 '20 06:12 Lufei-github

@Lufei-github Tried it a couple times and couldn't avoid a memory leak that reboots my computer. I don't have this problem elsewhere, even in similar contexts (DeepLab)—but this is also a super simple training loop, so the culprit shouldn't be hard to find.

Ended up using different DA methods for the project I was working on, but I'd be curious to hear if anyone else experiences this behavior. Though I switched to a different problem, ASN gave really compelling results after letting the single-GPU jobs run.

Dec 30 '20 07:12 lychrel

@lychrel I don't really understand your answer. I don't kow what is ASN. So can you answer me with a simply way?

Dec 30 '20 07:12 Lufei-github