Thresholding, overquery, and compression
The interaction between threshold and topk is confusing, it's almost "stop the search whichever comes first", but not quite, because we don't actually start accumulating results towards topk until we clear the threshold.
I'd like to simplify this to
- Find a local maximum in the graph using an approach similar to TwoPhaseTracker
- Pull out the top k
However, this doesn't currently work because there's a ton of noise introduced by compression. We can fix this by performing step 1 using exact similarities for the top candidate.
We know that that introduces a lot of expensive full-resolution comparisons. But we might be able to make up for that if we push overquery into the search process. (So if user asks for 100 results we pass topk=100 into the search and let it deal with the noise.)
Not sure what to do when PQ is so lossy that it doesn't ever really converge. Today you get garbage results, with my proposal you'd end up searching the entire graph instead, which is arguably worse.