Does it support multi-gpu training?
@thuanaislab Awesome work! I would like to ask, pl2map or pl2map++supports multi GPU training ?because my scenario is very large, so the training time for a single GPU is very long. thanks!
Hi @yuancaimaiyi,
Thanks for your interest in this work! It’s actually been a while since I last worked on it, so I don’t fully remember the details. But if I recall correctly, it only supports a single GPU.
I think you could try with PL2Map version. It was carefully tested.
@thuanaislab I am testing pl2map on Cambridge data, but I feel that the loss is significant and there is no significant downward trend. I would like to ask on which machine you are training, A100? My machine is an RTX 4060 8G GPU.
@yuancaimaiyi Oh yeah, the outdoor loss is a bit buggy, but it still seems to work well somehow. So no worries, just keep training and run the test at the end. You might be surprised by how good the results turn out. I think I'll fix this later or if you find out the issue could you please help to fix it.
@thuanaislab Unfortunately, I continued training as you said, and after completing the training, I conducted a test, and the results are as follows:
@yuancaimaiyi Hmm seems like the reprojection loss causing this issue! (It a bit unstable for outdoor training) Could you please turn off the reprojection loss by setting this to False.
https://github.com/ais-lab/pl2map/blob/813c6ca7d77ff0d9aa1b88e7bf273b047545ece6/cfgs/Cambridge.yaml#L40
Also, when experiments for pl2map I only use one 1080ti GPU.
I'm running a simple test by myself. Seems like I set the wrong configs for outdoor. Maybe the learning rate is too big.
I just pushed a more stable training config for Cambridge. Can you pull and try it? @yuancaimaiyi
@thuanaislab
Thank you very much for your suggestion. I have retrained and evaluated according to your latest configuration, and the results are as follows. I think the accuracy and other indicators are still not good enough. Should I further reduce the learning rate?
@yuancaimaiyi Yeah, that's right! I reduced the training iterations to 1M for a faster test. If you want a better accuracy, you should use about 1.5M or 2M iterations and reduce the learning rate a bit.
@thuanaislab Hi , I use 2000000 iterations and reduce the learning rate to 0.00006 ,but the result is still not good enough
I think here is the config that I used to obtain the best results: base_lr: 0.0001, augmentation: apply: true on_rate: 1.0 brightness: 0.15 contrast: 0.1 num_iters: 2500000