kerchunk icon indicating copy to clipboard operation
kerchunk copied to clipboard

ReferenceFileSystem generation for Zarr

Open benjaminleighton opened this issue 4 years ago • 3 comments

MultiZarrToZarr expects a collection of Zarr ReferenceFileSystem JSON paths. What steps are required to generate ReferenceFileSystem JSON from a Zarr Directory or Zip and is this something for kerchunk to support? Thanks!

benjaminleighton avatar Oct 27 '21 10:10 benjaminleighton

Do you mean that your original input datasets are already in zarr format? You could technically generate JSONs for each zarr input dataset and pass these to MultiZarrToZarr.

I don't think you could currently do multiple zarrs-in-zip, because ReferenceFileSystem requires a single file system to work on, and each zip counts as one.

A set of zarr datasets should really be the very simple case, since the directory structure is already of the right form, and we only ever need whole chunks. The implementation we have now is, if anything, too complicated for this case.

martindurant avatar Oct 27 '21 13:10 martindurant

Kerchunk also includes SingleHdfToZarr which creates the references https://fsspec.github.io/kerchunk/reference.html#kerchunk.hdf.SingleHdf5ToZarr

An example of this can be found in the docs here: https://fsspec.github.io/kerchunk/test_example.html#single-file-jsons

And in a blog post here (a bit out of date now but should still work, fsspec-reference-maker was renamed to kerchunk recently)

lsterzinger avatar Oct 27 '21 13:10 lsterzinger

My bad, I didn't read the original post close enough 😉

lsterzinger avatar Oct 27 '21 13:10 lsterzinger