About 3D reconstruction.
Hello, thank you for your excellent work!
I have a quick question: Is COLMAP-based 3D reconstruction still required during training? In particular, can your method still work on datasets that lack a covisibility structure or do not have sufficient overlapping views?
Thank you in advance!
Thank you for your question!
No, 3D reconstruction is not required for training. Our method only needs a set of reference images with known camera poses to perform reprojection supervision. That said, if a 3D reconstruction is available, we can optionally benefit from explicit 3D supervision to potentially improve performance.
When there is insufficient overlap between images, triangulation—whether explicit (as in COLMAP) or implicit (as in our method)—becomes unreliable or impossible and both methods will fail to reconstruct the scene. In such cases, if an external 3D reconstruction is available from other sensors (like LiDAR or depth cameras) or predictions of large models, it could still be used to provide explicit 3D supervision. The model might still generalize to query images, though the effectiveness would be limited.
Hope this clarifies things! Let me know if you have any further questions.