JSON codec cannot handle scalar array
This could be marked as "expected behaviour", since JSON documents must be objects or arrays, not scalars. However, I think the JSON codec should be able to cope with simple strings.
Minimal, reproducible code sample, a copy-pastable example if possible
import numcodecs
j = numcodecs.JSON()
import numpy as np
x = np.array("value", dtype="O")
j.encode(x)
File ~/conda/envs/py39/lib/python3.9/site-packages/numcodecs/json.py:59, in JSON.encode(self, buf)
57 buf = np.asarray(buf)
58 items = buf.tolist()
---> 59 items.append(buf.dtype.str)
60 items.append(buf.shape)
61 return self._encoder.encode(items).encode(self._text_encoding)
AttributeError: 'str' object has no attribute 'append'
Problem description
Expect to successfully roundtrip the value
Version and installation information
Please provide the following:
- Value of
numcodecs.__version__0.10.2 - Version of Python interpreter '3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:00:33) \n[Clang 13.0.1 ]'
- Operating system (Linux/Windows/Mac) MacOS on M1
- How NumCodecs was installed (e.g., "using pip into virtual environment", or "using conda") conda and source
The JSON codec also has problems with simple dictionaries, e.g., when x = {"a": 1} or x = np.array({"a": 1}, dtype=object) in the example above.
I suppose that those are scalars from this point of view. The code supposes that the outermost structure is a list. This should not be hard to fix.
Yeah, that seems to be a hard requirement. The internal workings indeed assume that an array is encoded. I don't know if this is by design, but it would be nice to be able to use the JSON codec with object arrays in zarr.
I think an object array does work now
In [128]: import numcodecs
In [129]: j = numcodecs.JSON()
In [130]: import numpy as np
In [131]: j.encode(np.array([{"X": 1}, {"Y": 2}], dtype="O"))
Out[131]: b'[{"X":1},{"Y":2},"|O",[2]]'
Well, yes it works for arrays with at least one dimension, but not for zero-dimensional arrays. Sorry if I was not clear in my earlier post. I agree that this is an edge case, but I feel it's possible to support this. Moreover, it's always nice to have border coverage.
--- a/numcodecs/json.py
+++ b/numcodecs/json.py
@@ -54,8 +54,8 @@ class JSON(Codec):
self._decoder = _json.JSONDecoder(**self._decoder_config)
def encode(self, buf):
- buf = np.asarray(buf)
- items = buf.tolist()
+ buf = np.array(buf)
+ items = np.atleast_1d(buf).tolist()
items.append(buf.dtype.str)
items.append(buf.shape)
return self._encoder.encode(items).encode(self._text_encoding)
@@ -63,7 +63,10 @@ class JSON(Codec):
def decode(self, buf, out=None):
items = self._decoder.decode(ensure_text(buf, self._text_encoding))
dec = np.empty(items[-1], dtype=items[-2])
- dec[:] = items[:-2]
+ if not items[-1]:
+ dec[...] = items[0]
+ else:
+ dec[:] = items[:-2]
if out is not None:
np.copyto(out, dec)
?
Yeah, this works for me! Thanks
OK, let's make it a PR and see if anyone has a more elegant resolution.