Bad loss spike when training nerfacto with high number of iterations

Open hoanhle opened this issue 1 year ago • 1 comments

Describe the bug I'm training nerfacto on my dataset with 300k iterations (since I want to keep the original data resolution and it's 10 images 6kx4k). I know the standard is training for 30k but I could not understand the bad loss spike when training longer. Is this an implementation or configuration issue? Can you guys give me a pointer to solve this?

Screenshot from 2024-08-07 12-49-57

Aug 07 '24 10:08 hoanhle

This seems like a numerical stability issue, which often happens if you train for too long. It's hard to tell without digging into the network + activations + etc, but you could try tuning the learning or weight decay parameters? (you can search ns-train nerfacto --help for lr and weight-decay)

Some small amount of weight decay (1e-4? 1e-3?) in particular might help with stability. But regularization might also hurt PSNR.

Aug 09 '24 00:08 brentyi