hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

add_items() with deleted element id keeps element marked deleted

Open qwertyforce opened this issue 4 years ago • 3 comments

import hnswlib
import random

dim = 4
objects = []
for i in range(2) :
    vector = random.sample(range(1000), dim)
    objects.append(vector)
print(objects)
index = hnswlib.Index(space='l2', dim=dim)
index.init_index(max_elements=5, ef_construction=200, M=32)
index.add_items(objects,[12,3])
print(index.get_ids_list())
print(index.get_items([3])) 
index.mark_deleted(3)

try:
    print(index.get_items([3]))
except RuntimeError:
    print("el not found")

index.add_items(objects[1],3)

try:
    print(index.get_items([3]))
except RuntimeError:
    print("el not found")
    
print(index.get_ids_list())
[[145, 640, 35, 805], [633, 104, 726, 950]]
[12, 3]
[[633.0, 104.0, 726.0, 950.0]]
el not found
el not found
[12, 3]

qwertyforce avatar Apr 25 '21 14:04 qwertyforce

@qwertyforce Thanks for reporting! @apoorv-sharma seems like we missed a bug

yurymalkov avatar Apr 26 '21 05:04 yurymalkov

Thank you. Looks like two fixes needs to be done (independent of each other):

  1. If element is deleted, and it is added back again unmarkDeletedInternal should be called before starting any update operation or an error could be thrown. But unmarkDeletedInternal is not exposed to the users yet to unmark on their own, so former approach would be better.
  2. Fix get_ids_list method to exclude deleted elements in general.

@yurymalkov Let me know if it sounds good, i will fix the 1) soon.

apoorv-sharma avatar Apr 26 '21 06:04 apoorv-sharma

@apoorv-sharma 1) Sounds great! Thank you! I think for 2) we probably need an option to include both (e.g. with a flag like return_deleted), otherwise there probably will be no option to access the ids of the deleted elements from python.

yurymalkov avatar Apr 26 '21 20:04 yurymalkov