arkouda icon indicating copy to clipboard operation
arkouda copied to clipboard

enable writing to single HDF5 file

Open reuster986 opened this issue 5 years ago • 3 comments

Multiple users have expressed a moderate desire to be able to save arrays to a single HDF5 file, as opposed to one file per locale. I think this is possible but perhaps requires a special version of HDF5?

reuster986 avatar Sep 23 '20 20:09 reuster986

@reuster986 I imagine we could copy all the data to a single locale and then write out to a file. Not sure why a special version of HDF5 is required?

hokiegeek2 avatar Sep 23 '20 20:09 hokiegeek2

This is something HDF5 supports and I don't think it requires a special version. There are chapel variants that read/write a distributed array from/to a single file -- https://chapel-lang.org/docs/modules/packages/HDF5/IOusingMPI.html

One of those currently requires MPI support (and writes aren't scaling as seen in https://github.com/mhmerrill/arkouda/issues/632), so I don't think we want to use them as-is, but I believe it is possible to use a single file without sacrificing parallel performance.

ronawho avatar Feb 05 '21 15:02 ronawho

Note that while HDF5 supports this, I'm not sure it's something we could do efficiently with Parquet or other potential future file formats.

ronawho avatar Jan 20 '22 15:01 ronawho