numcodecs icon indicating copy to clipboard operation
numcodecs copied to clipboard

add numcodecs.zarr3.to_zarr3 method

Open brokkoli71 opened this issue 9 months ago • 8 comments

  • implements to_zarr3 function in numcodecs.zarr3 for https://github.com/zarr-developers/zarr-python/issues/2964

TODO:

  • [x] Unit tests and/or doctests in docstrings
  • [x] Tests pass locally
  • [x] Docstrings and API docs for any new/modified user-facing classes and functions
  • [x] Changes documented in docs/release.rst
  • [ ] Docs build locally
  • [ ] GitHub Actions CI passes
  • [ ] Test coverage to 100% (Codecov passes)

brokkoli71 avatar Apr 22 '25 12:04 brokkoli71

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 99.89%. Comparing base (3438e16) to head (d661eab).

Files with missing lines Patch % Lines
numcodecs/tests/test_zarr3.py 94.44% 1 Missing :warning:
numcodecs/zarr3.py 88.88% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #741      +/-   ##
==========================================
- Coverage   99.96%   99.89%   -0.08%     
==========================================
  Files          63       63              
  Lines        2736     2763      +27     
==========================================
+ Hits         2735     2760      +25     
- Misses          1        3       +2     
Files with missing lines Coverage Δ
numcodecs/tests/test_zarr3.py 99.12% <94.44%> (-0.88%) :arrow_down:
numcodecs/zarr3.py 99.05% <88.88%> (-0.46%) :arrow_down:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 22 '25 12:04 codecov[bot]

Thanks for working on this! We should probably have a conversation about the strategy here. My preference would be to move away from separate zarr 2 and zarr 3 codec classes, which would look somewhat different than the effort here.

d-v-b avatar Apr 22 '25 13:04 d-v-b

My preference would be to move away from separate zarr 2 and zarr 3 codec classes, which would look somewhat different than the effort here.

Are these mutually exclusive?

I see this PR as a shim to solve a pretty urgent problem that Zarr users are experiencing in the V3 transition.

In the future, we could refactor how codec classes work, but that's likely a much slower process.

rabernat avatar Apr 22 '25 13:04 rabernat

My preference would be to move away from separate zarr 2 and zarr 3 codec classes, which would look somewhat different than the effort here.

Are these mutually exclusive?

I see this PR as a shim to solve a pretty urgent problem that Zarr users are experiencing in the V3 transition.

In the future, we could refactor how codec classes work, but that's likely a much slower process.

One way to achieve this shim without adding more problematic zarr 2 / zarr 3 logic to numcodecs would be to implement the changes in this PR in zarr-python, instead of numcodecs. Is there any reason why that would not be possible?

d-v-b avatar Apr 22 '25 13:04 d-v-b

My preference would be to move away from separate zarr 2 and zarr 3 codec classes, which would look somewhat different than the effort here.

Are these mutually exclusive? I see this PR as a shim to solve a pretty urgent problem that Zarr users are experiencing in the V3 transition. In the future, we could refactor how codec classes work, but that's likely a much slower process.

One way to achieve this shim without adding more problematic zarr 2 / zarr 3 logic to numcodecs would be to implement the changes in this PR in zarr-python, instead of numcodecs. Is there any reason why that would not be possible?

I would argue that adding this to zarr-python actually increases the problematic coupling, because this to_zarr3 method depends on private numcodecs interfaces. However, I think we can be pragmatic here and implement it on either side until we have resolved https://github.com/zarr-developers/numcodecs/issues/742

normanrz avatar Apr 23 '25 09:04 normanrz

I would argue that adding this to zarr-python actually increases the problematic coupling, because this to_zarr3 method depends on private numcodecs interfaces.

As numcodecs has so far existed chiefly for zarr-python's benefit, and we control numcodecs, I would argue that effectively all numcodecs interfaces are public to zarr-python. To put it differently, "zarr-python uses numcodecs interface X" would be a valid reason for us not to change that interface, whether interface X was public or not.

This is of course a problematic, and ultimately something we should fix. I think the first steps would be to fully extract as much zarr-specific-logic from numcodecs, which argues for making the code in this PR over in zarr-python.

d-v-b avatar Apr 23 '25 10:04 d-v-b

I created the zarr-any-numcodecs package that can wrap any existing numcodecs codec as a zarr v3 codec, which is more general (not limited to just the builtin numcodecs codecs) but also cannot be as optimized since this repo can create wrappers that benefit from implementation details, e.g. by exposing partial decoding support

juntyr avatar Dec 08 '25 07:12 juntyr

very cool @juntyr! You might be interested in https://github.com/zarr-developers/zarr-python/pull/3332, which is a PR against zarr python that aims to do something similar

d-v-b avatar Dec 08 '25 07:12 d-v-b