zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

`zarr.copy_all` does not copy the consolidated metadata

Open cwognum opened this issue 1 year ago • 0 comments

Zarr version

v2.17.1

Numcodecs version

v0.12.1

Python Version

3.12.2

Operating System

Linux

Installation

Using micromamba (conda)

Description

Together with @lmtroper, I am using a custom storage alternative for which multiple ls calls can quickly lead to significant overhead. Using consolidated Zarr archives significantly speeds up read access in this use case.

However, although it is a one time cost, as datasets get large it can still be painstakingly slow to create the consolidated archive directly on the storage backend. As a next optimization, we were thinking of thus consolidating the archive locally and then copying over the consolidated archive with zarr.copy_all. However, this seems to not copy over the consolidated metadata file (i.e. by default called .zmetadata).

An easy workaround is to upload this file manually, but I was expecting zarr.copy_all to make an exact copy of the source directory.

Steps to reproduce

src = zarr.open("/path/to/src.zarr", "w")
src.array("A", data=np.random.random(128))
src.array("B", data=np.random.random(128))
src.array("C", data=np.random.random(128))

src = zarr.consolidate_metadata(src.store)

dst = zarr.open("/path/to/dst.zarr", "w")
zarr.copy_all(src, dst)

fs, path = fsspec.core.url_to_fs("/path/to/dst.zarr")
fs.exists("/path/to/src.zarr/.zmetada")  # True
fs.exists("/path/to/dst.zarr/.zmetada")  # False

dst = zarr.open_consolidated("/path/to/dst.zarr")  # Crashes with a KeyError

Additional output

No response

cwognum avatar Mar 28 '24 14:03 cwognum