Reduce device memory usage for CAGRA's graph optimization process (MST optimization)
CAGRA can guarantee the connectivity of search graphs through a process called MST optimization. This MST optimization uses a GPU for speed, but if the device memory capacity is insufficient, a runtime error will occur.
This PR enables MST optimization can be performed on CPUs when device memory capacity is insufficient.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.
Contributors can view more details about this message here.
This PR does not simple enable existing GPU implementation to run on CPUs. We investigated how to execute MST optimization quickly on CPUs and, as a result, changed the processing method. Specifically, previously, the process of adding candidate edges to the MST graph and the connected component labeling computation were performed separately. We have now merged these two processes. While the primary goal was to accelerate the CPU implementation, the optimization is also effective for GPU implementation, so we have made the same changes to the existing GPU implementation as well.
/ok to test 16e285e
/ok to test 9b0b145