Serialisation of decompositions
Decomposing large tensors can be time-consuming, and it would therefore be useful to have an easy-to-use interface for storing these decompositions to disc. I am happy to work on this once we decide the API.
Possible file formats
I have had a good experience working with the Python binding for the HDF5 format and can recommend that. Alternatively, we can follow xarray and use the NetCDF format. SciPy has bindings for NetCDF v1 and v2, however these are legacy formats. The current NetCDF standard is compatible with HDF5, and there are two separate Python bindings: netCDF4 and h5netcdf. The former provides bindings for the NetCDF C-library, which also depends on the HDF5 C-library, while the uses h5py.
I know from experience that h5py is a very nice library to work with. It is well documented and it makes it very easy to compress the data to save disc space, but I'm happy to use NetCDF too.
API draft
Here is a draft for the API:
def store_DECOMPOSITION_TYPE(decomposition, path, internal_path="/", compression_opts=None, compression_args=None):
# Check if file exists and handle collisions
with h5py.File(path, "a") as h5:
# Check if internal path clashes and handle collisions
group = h5.create_group(internal_path)
group.attrs["decomposition_type"] = "DECOMPOSITION_TYPE"
# Add additional attributes such as the number of modes to the attrs field
# Store the decomposition
def load_DECOMPOSITION_TYPE(path, internal_path="/"):
with h5py.File(path, "r") as h5:
# Check if internal path exists
group = h5[internal_path]
if group.attrs["decomposition_type"] != "DECOMMPOSITION_TYPE":
raise ValueError("The HDF5 file contains a {group.attrs["decomposition_type"]} decomposition, not a DECOMPOSITION_TYPE.")
# Load the decomposition
Closing notes
The downside with this addition is that we add an additional dependency. However, we can make it optional — disabling the option to serialise files if h5py (or NetCDF) is not installed.
Sounds like a good idea - feels to me like a good candidate for tensorly-lab, what do you think?