add numcodecs.zarr3.to_zarr3 method
- implements
to_zarr3function innumcodecs.zarr3for https://github.com/zarr-developers/zarr-python/issues/2964
TODO:
- [x] Unit tests and/or doctests in docstrings
- [x] Tests pass locally
- [x] Docstrings and API docs for any new/modified user-facing classes and functions
- [x] Changes documented in docs/release.rst
- [ ] Docs build locally
- [ ] GitHub Actions CI passes
- [ ] Test coverage to 100% (Codecov passes)
Codecov Report
Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.
Project coverage is 99.89%. Comparing base (
3438e16) to head (d661eab).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| numcodecs/tests/test_zarr3.py | 94.44% | 1 Missing :warning: |
| numcodecs/zarr3.py | 88.88% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #741 +/- ##
==========================================
- Coverage 99.96% 99.89% -0.08%
==========================================
Files 63 63
Lines 2736 2763 +27
==========================================
+ Hits 2735 2760 +25
- Misses 1 3 +2
| Files with missing lines | Coverage Δ | |
|---|---|---|
| numcodecs/tests/test_zarr3.py | 99.12% <94.44%> (-0.88%) |
:arrow_down: |
| numcodecs/zarr3.py | 99.05% <88.88%> (-0.46%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Thanks for working on this! We should probably have a conversation about the strategy here. My preference would be to move away from separate zarr 2 and zarr 3 codec classes, which would look somewhat different than the effort here.
My preference would be to move away from separate zarr 2 and zarr 3 codec classes, which would look somewhat different than the effort here.
Are these mutually exclusive?
I see this PR as a shim to solve a pretty urgent problem that Zarr users are experiencing in the V3 transition.
In the future, we could refactor how codec classes work, but that's likely a much slower process.
My preference would be to move away from separate zarr 2 and zarr 3 codec classes, which would look somewhat different than the effort here.
Are these mutually exclusive?
I see this PR as a shim to solve a pretty urgent problem that Zarr users are experiencing in the V3 transition.
In the future, we could refactor how codec classes work, but that's likely a much slower process.
One way to achieve this shim without adding more problematic zarr 2 / zarr 3 logic to numcodecs would be to implement the changes in this PR in zarr-python, instead of numcodecs. Is there any reason why that would not be possible?
My preference would be to move away from separate zarr 2 and zarr 3 codec classes, which would look somewhat different than the effort here.
Are these mutually exclusive? I see this PR as a shim to solve a pretty urgent problem that Zarr users are experiencing in the V3 transition. In the future, we could refactor how codec classes work, but that's likely a much slower process.
One way to achieve this shim without adding more problematic zarr 2 / zarr 3 logic to numcodecs would be to implement the changes in this PR in zarr-python, instead of numcodecs. Is there any reason why that would not be possible?
I would argue that adding this to zarr-python actually increases the problematic coupling, because this to_zarr3 method depends on private numcodecs interfaces. However, I think we can be pragmatic here and implement it on either side until we have resolved https://github.com/zarr-developers/numcodecs/issues/742
I would argue that adding this to zarr-python actually increases the problematic coupling, because this
to_zarr3method depends on private numcodecs interfaces.
As numcodecs has so far existed chiefly for zarr-python's benefit, and we control numcodecs, I would argue that effectively all numcodecs interfaces are public to zarr-python. To put it differently, "zarr-python uses numcodecs interface X" would be a valid reason for us not to change that interface, whether interface X was public or not.
This is of course a problematic, and ultimately something we should fix. I think the first steps would be to fully extract as much zarr-specific-logic from numcodecs, which argues for making the code in this PR over in zarr-python.
I created the zarr-any-numcodecs package that can wrap any existing numcodecs codec as a zarr v3 codec, which is more general (not limited to just the builtin numcodecs codecs) but also cannot be as optimized since this repo can create wrappers that benefit from implementation details, e.g. by exposing partial decoding support
very cool @juntyr! You might be interested in https://github.com/zarr-developers/zarr-python/pull/3332, which is a PR against zarr python that aims to do something similar