pl2map icon indicating copy to clipboard operation
pl2map copied to clipboard

Does it support multi-gpu training?

Open yuancaimaiyi opened this issue 7 months ago • 10 comments

@thuanaislab Awesome work! I would like to ask, pl2map or pl2map++supports multi GPU training ?because my scenario is very large, so the training time for a single GPU is very long. thanks!

yuancaimaiyi avatar Jul 02 '25 04:07 yuancaimaiyi

Hi @yuancaimaiyi,

Thanks for your interest in this work! It’s actually been a while since I last worked on it, so I don’t fully remember the details. But if I recall correctly, it only supports a single GPU.

I think you could try with PL2Map version. It was carefully tested.

thuanaislab avatar Jul 02 '25 04:07 thuanaislab

@thuanaislab I am testing pl2map on Cambridge data, but I feel that the loss is significant and there is no significant downward trend. I would like to ask on which machine you are training, A100? My machine is an RTX 4060 8G GPU.

Image

yuancaimaiyi avatar Jul 05 '25 11:07 yuancaimaiyi

@yuancaimaiyi Oh yeah, the outdoor loss is a bit buggy, but it still seems to work well somehow. So no worries, just keep training and run the test at the end. You might be surprised by how good the results turn out. I think I'll fix this later or if you find out the issue could you please help to fix it.

thuanaislab avatar Jul 05 '25 20:07 thuanaislab

@thuanaislab Unfortunately, I continued training as you said, and after completing the training, I conducted a test, and the results are as follows:

Image test : python3 runners/eval.py --dataset Cambridge --scene StMarysChurch -expv pl2map Image The result is very poor, very strange, so I want to ask if it has anything to do with the machine? The training parameters are all set using Cambridge default

yuancaimaiyi avatar Jul 06 '25 11:07 yuancaimaiyi

@yuancaimaiyi Hmm seems like the reprojection loss causing this issue! (It a bit unstable for outdoor training) Could you please turn off the reprojection loss by setting this to False.

https://github.com/ais-lab/pl2map/blob/813c6ca7d77ff0d9aa1b88e7bf273b047545ece6/cfgs/Cambridge.yaml#L40

Also, when experiments for pl2map I only use one 1080ti GPU.

thuanaislab avatar Jul 06 '25 21:07 thuanaislab

I'm running a simple test by myself. Seems like I set the wrong configs for outdoor. Maybe the learning rate is too big.

I just pushed a more stable training config for Cambridge. Can you pull and try it? @yuancaimaiyi

thuanaislab avatar Jul 07 '25 04:07 thuanaislab

@thuanaislab Image Thank you very much for your suggestion. I have retrained and evaluated according to your latest configuration, and the results are as follows. I think the accuracy and other indicators are still not good enough. Should I further reduce the learning rate?

yuancaimaiyi avatar Jul 11 '25 01:07 yuancaimaiyi

@yuancaimaiyi Yeah, that's right! I reduced the training iterations to 1M for a faster test. If you want a better accuracy, you should use about 1.5M or 2M iterations and reduce the learning rate a bit.

thuanaislab avatar Jul 11 '25 05:07 thuanaislab

@thuanaislab Hi , I use 2000000 iterations and reduce the learning rate to 0.00006 ,but the result is still not good enough

Image I would like to know what is the best result you have achieved in outdoor measurements? Or, what is the best achievable proportion of 10cm/1 degree in outdoor measurements, as my scenarios are all outdoor.

yuancaimaiyi avatar Jul 27 '25 13:07 yuancaimaiyi

I think here is the config that I used to obtain the best results: base_lr: 0.0001, augmentation: apply: true on_rate: 1.0 brightness: 0.15 contrast: 0.1 num_iters: 2500000

thuanaislab avatar Jul 30 '25 07:07 thuanaislab