tinyknn icon indicating copy to clipboard operation
tinyknn copied to clipboard

Support estimating distance between two compressed datasets

Open thomasahle opened this issue 2 years ago • 2 comments

Often we use PQ to estimate the distance from a full precision vector to a bunch of compressed points. However, we can also try to compute the distance between all pairs of points in two compressed datasets (even possibly with distinct FastPQ instances).

This is relevant, for example, when inserting a batch of points into the data structure, when we quickly want to compute all the relevant close cluster centers. Currently we compute this using full precision distance computations.

Edit: Maybe #13 is more relevant for speeding up building the index. However, supporting estimating distances between compressed datasets is still interesting and worthwhile.

thomasahle avatar Apr 12 '23 20:04 thomasahle

@thomasahle Hi, I am new to opensource contribution and would like to take up on this issue, if thats okay... Could you tell me more about this issue?

sabbirtkdr avatar Apr 13 '23 04:04 sabbirtkdr

Awesome! Do you have a rough idea about how the library works already? In particular the distance_table method in the FastPQ class?

thomasahle avatar Apr 13 '23 06:04 thomasahle