tinyknn
tinyknn copied to clipboard
Support multi-ivf
A classical way to make building the index faster, cheaper memory wise, and potentially better (bigger, but lower quality) is to use a top level product code. Instead of just "hashing" each point to the closest centroid, hash it to "the pair of centroids" which has a sum closest to the point. My image here shows how using multi-indexing this way reduces the mean square error: https://twitter.com/thomasahle/status/1583582672906952705?s=20 Some of the code for doing this is here: https://gist.github.com/thomasahle/4f16b19aa395f25e8fee882e3a82a4d9