xarray-beam
xarray-beam copied to clipboard
Consider adding ZarrToChunks() and/or an open_zarr() helper function
These could facilitate directly opening data from Zarr using idiomatic patterns in Xarray-Beam (e.g., using Xarray's lazy indexing machinery instead of dask).
I'm imaging open_zarr() returning a tuple of values transform, template, chunks providing exactly the information needed to use the dataset in a Zarr-to-Zarr pipeline:
-
transformwould be the beam PTransform that could be used in a pipeline (equivalent to the result ofxbeam.ZarrToChunks()). -
templateitself would be an efficient lazy xarray.Dataset consisting of a single dask chunk, e.g., equivalent toxarray.zeros_like(xarray.open_zarr(..., chunks=None).chunk()). -
chunkswould be a dict of chunks on the underlying dataset.
Usage examples:
with beam.Pipeline() as p:
p | xbeam.ZarrToChunks(..., desired_chunks) | ...
with beam.Pipeline() as p:
load_data, template, original_chunks = xbeam.open_zarr(...)
p | load_data | beam.MapTuple(...) | xbeam.ChunksToZarr(..., template, original_chunks)