get_items return type not as expected.
import hnswlib
import numpy as np
dim = 128
num_elements = 10000
# Generating sample data
data = np.float32(np.random.random((num_elements, dim)))
data_labels = np.arange(num_elements)
# Declaring index
p = hnswlib.Index(space = 'cosine', dim = dim) # possible options are l2, cosine or ip
# Initing index - the maximum number of elements should be known beforehand
p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)
# Element insertion (can be called several times):
p.add_items(data, data_labels)
output_items = p.get_items(data_labels)
print(type(output_items),type(output_items[0]),type(output_items[0][0]))`
Output : <class 'list'> <class 'list'> <class 'float'>
The documentation says that :
get_items(ids) - returns a numpy array (shape:N*dim) of vectors that have integer identifiers specified in ids numpy vector (shape:N)
@hemantpugaliya thanks for reporting! I wonder what negative effects it can have? I am not sure if it makes sense to change the documentation or to change the return type.
@yurymalkov I ran into a use case where i'm trying to return 10s of millions of vector from the index based using get_items . According to my calculation my memory would've been enough for the numpy matrix. However due to object pointer overheads in lists of lists i faced an OutOfMemory error. I think returning numpy matrix is a cleaner and more efficient design. Also as the input expected(as per the documentation) is numpy array of ids , it makes sense to return numpy matrix of vectors.
@hemantpugaliya Thanks! I am going to fix it.
Thanks !
Friendly bump. It reproduces in 0.7.0.
Thank you, @ahirner for the reminder. We will fix it.
hi thanks for the fix! this would be tremendously helpful for a use case i'm running into as well.
would it be possible to make a new release cut to include the fix 🙏 ? thanks in advance!