numcodecs icon indicating copy to clipboard operation
numcodecs copied to clipboard

JSON codec cannot handle scalar array

Open martindurant opened this issue 3 years ago • 8 comments

This could be marked as "expected behaviour", since JSON documents must be objects or arrays, not scalars. However, I think the JSON codec should be able to cope with simple strings.

Minimal, reproducible code sample, a copy-pastable example if possible

import numcodecs
j = numcodecs.JSON()
import numpy as np
x = np.array("value", dtype="O")
j.encode(x)

File ~/conda/envs/py39/lib/python3.9/site-packages/numcodecs/json.py:59, in JSON.encode(self, buf)
     57 buf = np.asarray(buf)
     58 items = buf.tolist()
---> 59 items.append(buf.dtype.str)
     60 items.append(buf.shape)
     61 return self._encoder.encode(items).encode(self._text_encoding)

AttributeError: 'str' object has no attribute 'append'

Problem description

Expect to successfully roundtrip the value

Version and installation information

Please provide the following:

  • Value of numcodecs.__version__ 0.10.2
  • Version of Python interpreter '3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:00:33) \n[Clang 13.0.1 ]'
  • Operating system (Linux/Windows/Mac) MacOS on M1
  • How NumCodecs was installed (e.g., "using pip into virtual environment", or "using conda") conda and source

martindurant avatar Oct 17 '22 15:10 martindurant

The JSON codec also has problems with simple dictionaries, e.g., when x = {"a": 1} or x = np.array({"a": 1}, dtype=object) in the example above.

david-zwicker avatar Oct 23 '22 11:10 david-zwicker

I suppose that those are scalars from this point of view. The code supposes that the outermost structure is a list. This should not be hard to fix.

martindurant avatar Oct 23 '22 13:10 martindurant

Yeah, that seems to be a hard requirement. The internal workings indeed assume that an array is encoded. I don't know if this is by design, but it would be nice to be able to use the JSON codec with object arrays in zarr.

david-zwicker avatar Oct 23 '22 13:10 david-zwicker

I think an object array does work now

In [128]: import numcodecs

In [129]: j = numcodecs.JSON()

In [130]: import numpy as np

In [131]: j.encode(np.array([{"X": 1}, {"Y": 2}], dtype="O"))
Out[131]: b'[{"X":1},{"Y":2},"|O",[2]]'

martindurant avatar Oct 23 '22 19:10 martindurant

Well, yes it works for arrays with at least one dimension, but not for zero-dimensional arrays. Sorry if I was not clear in my earlier post. I agree that this is an edge case, but I feel it's possible to support this. Moreover, it's always nice to have border coverage.

david-zwicker avatar Oct 23 '22 20:10 david-zwicker

--- a/numcodecs/json.py
+++ b/numcodecs/json.py
@@ -54,8 +54,8 @@ class JSON(Codec):
         self._decoder = _json.JSONDecoder(**self._decoder_config)

     def encode(self, buf):
-        buf = np.asarray(buf)
-        items = buf.tolist()
+        buf = np.array(buf)
+        items = np.atleast_1d(buf).tolist()
         items.append(buf.dtype.str)
         items.append(buf.shape)
         return self._encoder.encode(items).encode(self._text_encoding)
@@ -63,7 +63,10 @@ class JSON(Codec):
     def decode(self, buf, out=None):
         items = self._decoder.decode(ensure_text(buf, self._text_encoding))
         dec = np.empty(items[-1], dtype=items[-2])
-        dec[:] = items[:-2]
+        if not items[-1]:
+            dec[...] = items[0]
+        else:
+            dec[:] = items[:-2]
         if out is not None:
             np.copyto(out, dec)

?

martindurant avatar Oct 24 '22 01:10 martindurant

Yeah, this works for me! Thanks

david-zwicker avatar Oct 24 '22 07:10 david-zwicker

OK, let's make it a PR and see if anyone has a more elegant resolution.

martindurant avatar Oct 24 '22 15:10 martindurant