hsds icon indicating copy to clipboard operation
hsds copied to clipboard

pandas Support

Open gheber opened this issue 3 years ago • 5 comments

import h5pyd as h5py -> Happiness import pandahsds as pandas -> Sadness

gheber avatar Jun 30 '22 14:06 gheber

It's pretty easy now to read a numpy array with h5pyd and convert to a Pandas dataframe. See: https://github.com/HDFGroup/hdflab_examples/blob/master/Tutorial/09-Queries.ipynb for an example.

Using HSDS as the basis for a distributed table package would be interesting. This idea is explored a bit in: https://github.com/h5py/h5py/issues/2095.

jreadey avatar Jun 30 '22 16:06 jreadey

Right, but I want to read an HDF5 file created via DataFrame.to_hdf.

gheber avatar Jun 30 '22 17:06 gheber

Or DataFrame.to_hsds :smile:

gheber avatar Jun 30 '22 17:06 gheber

Perhaps this could be done by enabling pandas HDF-related methods to accept an h5py.File object? Then this could also be an h5pyd.File object.

ajelenak avatar Jun 30 '22 22:06 ajelenak

Pandas is designed to work with in-memory data which has led to several other projects that support Pandas-like API but work with larger data sets than Pandas can support. Something like: https://github.com/vaexio/vaex, already supports HDF5. Extend to support h5pyd?

jreadey avatar Jul 07 '22 18:07 jreadey