[BUG] NN Descent iteration terminating early for small dataset and specific data dimension
Describe the bug
For small datasets (n_rows=100) NN Descent breaks out of the iteration loop early (after 1 iteration), and does not properly return update indices.
Specifically, this behavior happens when graph_degree=64 (but is not an issue when graph_degree=32).
After some investigation, it seems like the following if statement in update_and_sample (in raft/cpp/include/raft/neighbors/detail/nn_descent.cuh) become true immediately after 1 iteration.
if (update_counter_ < build_config_.termination_threshold * nrow_ *
build_config_.dataset_dim / counter_interval) {
update_counter_ = -1;
}
It may be worth looking into the update_graph function (in the same file) and check when whether the update_counter is being incremented properly.
This is not an issue for the current test configuration which tests for large datasets - 1000, 2000 n_rows.
Steps/Code to reproduce bug
Add the following prints to raft/cpp/test/neighbors/ann_nn_descent.cuh right before EXPECT_TRUE.
raft::print_host_vector("indices naive after build", indices_naive.data(), 5, std::cout);
raft::print_host_vector("indices nndescent after build", indices_NNDescent.data(), 5, std::cout);
EXPECT_TRUE(eval_recall(
indices_naive, indices_NNDescent, ps.n_rows, ps.graph_degree, 0.001, min_recall));
Also, change the test parameters like below by adding 100 to n_rows
const std::vector<AnnNNDescentInputs> inputs = raft::util::itertools::product<AnnNNDescentInputs>(
{100, 1000, 2000}, // n_rows
{3, 5, 7, 8, 17, 64, 128, 137, 192, 256, 512, 619, 1024}, // dim
{32, 64}, // graph_degree
{raft::distance::DistanceType::L2Expanded},
{false, true},
{0.90});
Run the test.
./cpp/build/gtests/NEIGHBORS_ANN_NN_DESCENT_TEST --gtest_filter=AnnNNDescentTest/AnnNNDescentTestI8_U32*
It can be seen that the indices are not updated properly.
indices naive=[0,21,50,60,87];
indices nndescent=[1,2,3,4,5]; // stays same
For distances (feature to be added by this PR) it also does not get updated and stays as initial FLOAT_MAX values.
dists naive after build=[0,1,13,14,27,35,35,50,56,62];
dists nndescent after build=[3.40282e+38,3.40282e+38,3.40282e+38,3.40282e+38,3.40282e+38,3.40282e+38,3.40282e+38,3.40282e+38,3.40282e+38,3.40282e+38];
indices naive after build=[0,2,49,82,61,9,67,32,13,97];
indices nndescent after build=[1,2,3,4,5,6,7,8,9,10];
Environment details (please complete the following information):
build in devcontainers cuda12.2-conda