hnswlib
hnswlib copied to clipboard
add_items() with deleted element id keeps element marked deleted
import hnswlib
import random
dim = 4
objects = []
for i in range(2) :
vector = random.sample(range(1000), dim)
objects.append(vector)
print(objects)
index = hnswlib.Index(space='l2', dim=dim)
index.init_index(max_elements=5, ef_construction=200, M=32)
index.add_items(objects,[12,3])
print(index.get_ids_list())
print(index.get_items([3]))
index.mark_deleted(3)
try:
print(index.get_items([3]))
except RuntimeError:
print("el not found")
index.add_items(objects[1],3)
try:
print(index.get_items([3]))
except RuntimeError:
print("el not found")
print(index.get_ids_list())
[[145, 640, 35, 805], [633, 104, 726, 950]]
[12, 3]
[[633.0, 104.0, 726.0, 950.0]]
el not found
el not found
[12, 3]
@qwertyforce Thanks for reporting! @apoorv-sharma seems like we missed a bug
Thank you. Looks like two fixes needs to be done (independent of each other):
- If element is deleted, and it is added back again unmarkDeletedInternal should be called before starting any update operation or an error could be thrown. But unmarkDeletedInternal is not exposed to the users yet to unmark on their own, so former approach would be better.
- Fix get_ids_list method to exclude deleted elements in general.
@yurymalkov Let me know if it sounds good, i will fix the 1) soon.
@apoorv-sharma 1) Sounds great! Thank you!
I think for 2) we probably need an option to include both (e.g. with a flag like return_deleted), otherwise there probably will be no option to access the ids of the deleted elements from python.