ReferenceFileSystem generation for Zarr
MultiZarrToZarr expects a collection of Zarr ReferenceFileSystem JSON paths. What steps are required to generate ReferenceFileSystem JSON from a Zarr Directory or Zip and is this something for kerchunk to support? Thanks!
Do you mean that your original input datasets are already in zarr format? You could technically generate JSONs for each zarr input dataset and pass these to MultiZarrToZarr.
I don't think you could currently do multiple zarrs-in-zip, because ReferenceFileSystem requires a single file system to work on, and each zip counts as one.
A set of zarr datasets should really be the very simple case, since the directory structure is already of the right form, and we only ever need whole chunks. The implementation we have now is, if anything, too complicated for this case.
Kerchunk also includes SingleHdfToZarr which creates the references https://fsspec.github.io/kerchunk/reference.html#kerchunk.hdf.SingleHdf5ToZarr
An example of this can be found in the docs here: https://fsspec.github.io/kerchunk/test_example.html#single-file-jsons
And in a blog post here (a bit out of date now but should still work, fsspec-reference-maker was renamed to kerchunk recently)
My bad, I didn't read the original post close enough 😉