zarr-python behavior of `is_total_slice` for boundary chunks

The is_total_slice function is used to determine whether a selection corresponds to an entire chunk (is_total_slice -> True) or not (is_total_slice -> False).

I noticed that is_total_slice is always False for "partial chunks", i.e. chunks that are not filled by the array:

import zarr
from zarr.util import is_total_slice
# create an array with 1 full chunk and 1 partial chunk
a = zarr.open('test.zarr', path='test', shape=(10,), chunks=(9,), dtype='uint8', mode='w')
for x in BasicIndexer(slice(None), a):
    print(x.chunk_selection, is_total_slice(x.chunk_selection, a._chunks))

Which prints this:

(slice(0, 9, 1),) True
(slice(0, 1, 1),) False

Although the last selection is not the size of a full chunk, it is "total" with respect to the output of that selection in the array. A direct consequence of this behavior is unnecessary chunk loading when performing array assignments -- zarr uses the result of is_total_slice to decide whether to load existing chunk data or not. Because is_total_slice is always False for partial chunks, zarr always loads boundary chunks during assignment.

A solution to this would be to augment the is_total_slice function to account for partial chunks. I'm not sure at the moment how to do this exactly, but it shouldn't be hard. Happy to bring forth a PR if people agree that this is an issue worth addressing.

May 23 '21 23:05 d-v-b

Yeah that seems reasonable

May 25 '21 00:05 jakirkham

I made some tests and I think there is actually no way to reach the desired behavior of is_total_slice since the modification proposed by @d-v-b relies on the relative position of the slice in the array, and this kind of information is not delivered to is_total_slice.

One possible solution would be to add a new (maybe optional) parameter to is_total_slice to communicate the relative position of the slice (i.e. the upper left corner).

In this case I would like to work on the problem.

Oct 08 '21 09:10 fandreuz

@fAndreuzzi you might want to pick up from my efforts here: https://github.com/d-v-b/zarr-python/tree/boundary_chunk_optimization

The strategy I'm taking is to intercept the arguments to is_total_slice with a new function trim_chunks, which basically clamps chunk slices to the bounds of an array. This passes the core tests, but currently fails test_parallel_append with the interprocess lock. There must be a race condition, but I have no idea where its coming from :/

Oct 08 '21 16:10 d-v-b

@d-v-b Thank you very much for the hints!

Oct 09 '21 11:10 fandreuz

Just a question to check whether I understood the key idea of zarr.

If we consider the example provided by @d-v-b, I would expect the result of is_total_slice(slice(1,10,1), a._chunks) to be False, since the array is divided into chunks a[0:9] and a[9], therefore a[1:10] is not a chunk. However it looks like this test returns True. Am I doing something wrong?

Oct 09 '21 11:10 fandreuz

This also affects reads. For e.g. when dask chunks exactly overlap with Zarr chunks, it should be possible to decode directly to a memory buffer.

Instead because is_total_slice fails for boundary chunks, all boundary chunks are always uselessly copied one extra time.

It should be easy to tell _process_chunk the expected chunk shape for that particular chunk and pass that to is_total_slice instead of self._chunks (the loop that calls _process_chunk knows the chunk key)

An example profile in which 18 chunks take the fast path, and 9 chunks are uselessly copied in the last line.

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2013                                               def _process_chunk(
  2014                                                   self,
  2015                                                   out,
  2016                                                   cdata,
  2017                                                   chunk_selection,
  2018                                                   drop_axes,
  2019                                                   out_is_ndarray,
  2020                                                   fields,
  2021                                                   out_selection,
  2022                                                   partial_read_decode=False,
  2023                                               ):
  2024                                                   """Take binary data from storage and fill output array"""
  2025       126      25000.0    198.4      0.0          if (
  2026        27      58000.0   2148.1      0.0              out_is_ndarray
  2027        27       6000.0    222.2      0.0              and not fields
  2028        27     572000.0  21185.2      0.2              and is_contiguous_selection(out_selection)
  2029        27     244000.0   9037.0      0.1              and is_total_slice(chunk_selection, self._chunks)
  2030        18       8000.0    444.4      0.0              and not self._filters
  2031        18      41000.0   2277.8      0.0              and self._dtype != object
  2032                                                   ):
  2033        18      37000.0   2055.6      0.0              dest = out[out_selection]
  2034                                                       # Assume that array-like objects that doesn't have a
  2035                                                       # `writeable` flag is writable.
  2036        18      16000.0    888.9      0.0              dest_is_writable = getattr(dest, "writeable", True)
  2037        36       8000.0    222.2      0.0              write_direct = dest_is_writable and (
  2038        18      24000.0   1333.3      0.0                  (self._order == "C" and dest.flags.c_contiguous)
  2039                                                           or (self._order == "F" and dest.flags.f_contiguous)
  2040                                                       )
  2041        18       4000.0    222.2      0.0              if write_direct:
  2042                                                           # optimization: we want the whole chunk, and the destination is
  2043                                                           # contiguous, so we can decompress directly from the chunk
  2044                                                           # into the destination array
  2045        18      11000.0    611.1      0.0                  if self._compressor:
  2046        18       7000.0    388.9      0.0                      if isinstance(cdata, PartialReadBuffer):
  2047                                                                   cdata = cdata.read_full()
  2048        18  105409000.0    6e+06     42.5                      self._compressor.decode(cdata, dest)
  2049                                                           else:
  2050                                                               if isinstance(cdata, UncompressedPartialReadBufferV3):
  2051                                                                   cdata = cdata.read_full()
  2052                                                               chunk = ensure_ndarray_like(cdata).view(self._dtype)
  2053                                                               chunk = chunk.reshape(self._chunks, order=self._order)
  2054                                                               np.copyto(dest, chunk)
  2055        18      11000.0    611.1      0.0                  return
  2056                                           


....


  2093         9   80870000.0    9e+06     32.6          chunk = self._decode_chunk(cdata)
  2094                                           
  2095                                                   # select data from chunk
  2096         9       4000.0    444.4      0.0          if fields:
  2097                                                       chunk = chunk[fields]
  2098         9      27000.0   3000.0      0.0          tmp = chunk[chunk_selection]
  2099         9       1000.0    111.1      0.0          if drop_axes:
  2100                                                       tmp = np.squeeze(tmp, axis=drop_axes)
  2101                                           
  2102                                                   # store selected data in output
  2103                                                   # import ipdb; ipdb.set_trace()
  2104         9   60815000.0    7e+06     24.5          out[out_selection] = tmp

Apr 18 '24 20:04 dcherian

I filed a related issue recently as well

https://github.com/zarr-developers/zarr-python/issues/1730

Apr 18 '24 20:04 rabernat