the SPTAGClient.AnnClient.Search method treats query and input vectors differently wrt normalization
Bug description
when we build an SPTAGClient.AnnClient object off of a ANN server which loaded a Index.DistCalcMethod=Cosine (default) index, I expect that if I Search for any vector in that index (including non-unit vectors) that the nearest neighbor returned by Search should be that vector itself, and it should have distance=0. the current behavior is to actually return 1 - np.linalg.norm(non_unit_vector)
To Reproduce Steps to reproduce the behavior:
- copy the Singlebox Python Wrapper example in the
GettingStart.mdfile - edit the last two lines to replace
L2withCosine - run this file
Expected behavior
for 10-element query vectors [0, 0, ..., 0], [2, 2, ..., 2], and [4, 4, ..., 4], the cosine distance of every vector in the input index (all [n, n, ..., n] for `0 < n < 100) should be exactly 0 (they have different magnitudes but the same direction)
Observed Behavior
the measured distance is not 1 - CosineSim(x, y), but instead 1 - |x| * CosineSim(x, y). for the three test query vectors, this means we see
- all 0s:
[0, 0, 0] - all 2s:
[-5.324554920196533, -5.324554920196533, -5.324554920196533](note:-5.324554920196533 = 1 - 1 - np.sqrt(10 * (2 ** 2)) = 1 - np.linalg.norm(q[1]) - all 4s:
[-11.649109840393066, -11.649109840393066, -11.649109840393066](note:-11.649109840393066 = 1 - 1 - np.sqrt(10 * (4 ** 2)) = 1 - np.linalg.norm(q[2])
Desktop (please complete the following information):
using the current Dockerfile build
is there any update on this issue?