Reduce device memory usage for CAGRA's graph optimization process (reverse graph creation)
Since CAGRA improves search accuracy by merging forward and reverse graphs, reverse graph is created as part of the graph optimization process. Currently, GPU is used to create the reverse graph at high speed, but when creating a graph for huge dataset, the amount of device memory may be insufficient to create the graph.
In this PR, as a countermeasure for the lack of device memory, we will add an implementation in which reverse graphs are created on the CPU.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
/ok to test bfc45cd
/ok to test b0f648b
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.
Contributors can view more details about this message here.
Thanks for the review Tamas, I think I've addressed everything you pointed out, could you double check?