SPTAG icon indicating copy to clipboard operation
SPTAG copied to clipboard

the SPTAGClient.AnnClient.Search method treats query and input vectors differently wrt normalization

Open RZachLamberty opened this issue 6 years ago • 1 comments

Bug description when we build an SPTAGClient.AnnClient object off of a ANN server which loaded a Index.DistCalcMethod=Cosine (default) index, I expect that if I Search for any vector in that index (including non-unit vectors) that the nearest neighbor returned by Search should be that vector itself, and it should have distance=0. the current behavior is to actually return 1 - np.linalg.norm(non_unit_vector)

To Reproduce Steps to reproduce the behavior:

  1. copy the Singlebox Python Wrapper example in the GettingStart.md file
  2. edit the last two lines to replace L2 with Cosine
  3. run this file

Expected behavior for 10-element query vectors [0, 0, ..., 0], [2, 2, ..., 2], and [4, 4, ..., 4], the cosine distance of every vector in the input index (all [n, n, ..., n] for `0 < n < 100) should be exactly 0 (they have different magnitudes but the same direction)

Observed Behavior the measured distance is not 1 - CosineSim(x, y), but instead 1 - |x| * CosineSim(x, y). for the three test query vectors, this means we see

  • all 0s: [0, 0, 0]
  • all 2s: [-5.324554920196533, -5.324554920196533, -5.324554920196533] (note: -5.324554920196533 = 1 - 1 - np.sqrt(10 * (2 ** 2)) = 1 - np.linalg.norm(q[1])
  • all 4s: [-11.649109840393066, -11.649109840393066, -11.649109840393066] (note: -11.649109840393066 = 1 - 1 - np.sqrt(10 * (4 ** 2)) = 1 - np.linalg.norm(q[2])

Desktop (please complete the following information): using the current Dockerfile build

RZachLamberty avatar Jul 18 '19 18:07 RZachLamberty

is there any update on this issue?

RZachLamberty avatar Jan 06 '20 23:01 RZachLamberty