hnswlib icon indicating copy to clipboard operation
hnswlib copied to clipboard

get_items return type not as expected.

Open hemantpugaliya opened this issue 5 years ago • 7 comments

import hnswlib
import numpy as np

dim = 128
num_elements = 10000

# Generating sample data
data = np.float32(np.random.random((num_elements, dim)))
data_labels = np.arange(num_elements)

# Declaring index
p = hnswlib.Index(space = 'cosine', dim = dim) # possible options are l2, cosine or ip

# Initing index - the maximum number of elements should be known beforehand
p.init_index(max_elements = num_elements, ef_construction = 200, M = 16)

# Element insertion (can be called several times):
p.add_items(data, data_labels)
output_items = p.get_items(data_labels)
print(type(output_items),type(output_items[0]),type(output_items[0][0]))`

Output : <class 'list'> <class 'list'> <class 'float'> The documentation says that : get_items(ids) - returns a numpy array (shape:N*dim) of vectors that have integer identifiers specified in ids numpy vector (shape:N)

hemantpugaliya avatar Jun 27 '20 00:06 hemantpugaliya

@hemantpugaliya thanks for reporting! I wonder what negative effects it can have? I am not sure if it makes sense to change the documentation or to change the return type.

yurymalkov avatar Jun 27 '20 04:06 yurymalkov

@yurymalkov I ran into a use case where i'm trying to return 10s of millions of vector from the index based using get_items . According to my calculation my memory would've been enough for the numpy matrix. However due to object pointer overheads in lists of lists i faced an OutOfMemory error. I think returning numpy matrix is a cleaner and more efficient design. Also as the input expected(as per the documentation) is numpy array of ids , it makes sense to return numpy matrix of vectors.

hemantpugaliya avatar Jun 27 '20 23:06 hemantpugaliya

@hemantpugaliya Thanks! I am going to fix it.

yurymalkov avatar Jul 12 '20 05:07 yurymalkov

Thanks !

hemantpugaliya avatar Jul 14 '20 05:07 hemantpugaliya

Friendly bump. It reproduces in 0.7.0.

ahirner avatar Jul 03 '23 20:07 ahirner

Thank you, @ahirner for the reminder. We will fix it.

dyashuni avatar Jul 15 '23 06:07 dyashuni

hi thanks for the fix! this would be tremendously helpful for a use case i'm running into as well.

would it be possible to make a new release cut to include the fix 🙏 ? thanks in advance!

fei-glean avatar Sep 18 '23 23:09 fei-glean