pq_size parameter setting
Hi,
I'm just using song for running datasets in your paper.
Can you tell me this parameter setting for each dataset? In the readme, you give an example with pq_size = 100.
So can I use 100 for all of them? Will this parameter largely affect the quality and efficiency of graph building and query?
Actually, I test nytimes dataset in https://github.com/erikbern/ann-benchmarks, song takes around 10 min to finish graph construction. Does that make sense? All the commands I use follows your readme.
./build_graph.sh /path2/nytimes-256-angular_base_libsvm 290000 256 cos
Also, can you share piece of code that how you calculate the recall? Since I found that you do not read groundtrue file in your system. I add this computation code at the end of your main.cu file.
Does this correct for how you calculate in your paper?
The graph construction runs with a single thread by default. If you would like to parallel it, feel free to call graph->add and data->add concurrently. Alternatively, you could construct graph with HNSW or any other library and plug the graph into it---since SONG mainly focuses on the searching part. We no longer maintain the ground truth files. You could compute it and enumerate the search budget for the query to get the curve.
Ok.
So how to plug the graph contructed by HNSW for SONG to use for later searching.
And, for graph data stored in bfsg.graph, does it stored in CSR format?
What kind of data layout it stored this graph actually?