streaming icon indicating copy to clipboard operation
streaming copied to clipboard

Distributed Key Value Tensor Store

Open functionstackx opened this issue 2 years ago • 2 comments

Is it possible to use streaming dataset as a distributed key value store?

i have a set of keys (strings like "xyz_123") each that correspond to an numpy array

ideally I can do something like

np_array = dataset["xyz_123"]

but i see with MDSWriter.write that the keys of the dataset are just sequential and i can't change them.

Is there a way to have a custom key for MDSWriter?

functionstackx avatar Dec 16 '23 19:12 functionstackx

Hi @OrenLeung, what is the size of the dataset and how many unique keys you have in the dataset?

karan6181 avatar Dec 19 '23 16:12 karan6181

@karan6181 the size is about 1 TB and about 100k unique keys

functionstackx avatar Jan 15 '24 22:01 functionstackx