tinyknn
tinyknn copied to clipboard
A tiny approximate K-Nearest Neighbour library in Python based on Fast Product Quantization and IVF
For `pip install .`: ``` tinyknn/_fast_pq.cpp:17539:320: error: can’t convert a value of type ‘int’ to vector type ‘__m128i’ {aka ‘__vector(2) long long int’} which has different size 17539 | __pyx_t_6...
Since 2df6a428cf6bcc4e4a08f15e3f7caef9ce5f4f61 it is possible to store every datapoint in `n` lists by building with `ivf.build(n_probes=n)`. This increases performance recall/qps quite a lot, but only when going from `n=1` to...
AVX-512 has some nice features, such as support for fast float16 operations. This might allow us to do rescoring very fast. The Quicker ADC paper also mentions some uses of...
Often we use PQ to estimate the distance from a full precision vector to a bunch of compressed points. However, we can also try to compute the distance between all...
Currently `IVF.fit(...)` uses brute force nearest neighbours to find which clusters to insert the points into. Instead we could use the same `PQ.top(...)` method that we use to do queries...
A classical way to make building the index faster, cheaper memory wise, and potentially better (bigger, but lower quality) is to use a top level product code. Instead of just...
Currently the same product quantizer is used for every cluster in IVF. However, the PQ doesn't use a lot of space (it's just 16 center points), so we might as...
Currently, only SSE is supported. It would be nice to also support AMD chips.
PyNNDescent (https://github.com/lmcinnes/pynndescent) is able to get a lot of speed (presumably) by using [numba](https://numba.pydata.org/). This should be relatively easy to add to fast_pq as well.