behavior of `is_total_slice` for boundary chunks
The is_total_slice function is used to determine whether a selection corresponds to an entire chunk (is_total_slice -> True) or not (is_total_slice -> False).
I noticed that is_total_slice is always False for "partial chunks", i.e. chunks that are not filled by the array:
import zarr
from zarr.util import is_total_slice
# create an array with 1 full chunk and 1 partial chunk
a = zarr.open('test.zarr', path='test', shape=(10,), chunks=(9,), dtype='uint8', mode='w')
for x in BasicIndexer(slice(None), a):
print(x.chunk_selection, is_total_slice(x.chunk_selection, a._chunks))
Which prints this:
(slice(0, 9, 1),) True
(slice(0, 1, 1),) False
Although the last selection is not the size of a full chunk, it is "total" with respect to the output of that selection in the array.
A direct consequence of this behavior is unnecessary chunk loading when performing array assignments -- zarr uses the result of is_total_slice to decide whether to load existing chunk data or not. Because is_total_slice is always False for partial chunks, zarr always loads boundary chunks during assignment.
A solution to this would be to augment the is_total_slice function to account for partial chunks. I'm not sure at the moment how to do this exactly, but it shouldn't be hard. Happy to bring forth a PR if people agree that this is an issue worth addressing.
Yeah that seems reasonable
I made some tests and I think there is actually no way to reach the desired behavior of is_total_slice since the modification proposed by @d-v-b relies on the relative position of the slice in the array, and this kind of information is not delivered to is_total_slice.
One possible solution would be to add a new (maybe optional) parameter to is_total_slice to communicate the relative position of the slice (i.e. the upper left corner).
In this case I would like to work on the problem.
@fAndreuzzi you might want to pick up from my efforts here: https://github.com/d-v-b/zarr-python/tree/boundary_chunk_optimization
The strategy I'm taking is to intercept the arguments to is_total_slice with a new function trim_chunks, which basically clamps chunk slices to the bounds of an array. This passes the core tests, but currently fails test_parallel_append with the interprocess lock. There must be a race condition, but I have no idea where its coming from :/
@d-v-b Thank you very much for the hints!
Just a question to check whether I understood the key idea of zarr.
If we consider the example provided by @d-v-b, I would expect the result of is_total_slice(slice(1,10,1), a._chunks) to be False, since the array is divided into chunks a[0:9] and a[9], therefore a[1:10] is not a chunk. However it looks like this test returns True. Am I doing something wrong?
This also affects reads. For e.g. when dask chunks exactly overlap with Zarr chunks, it should be possible to decode directly to a memory buffer.
Instead because is_total_slice fails for boundary chunks, all boundary chunks are always uselessly copied one extra time.
It should be easy to tell _process_chunk the expected chunk shape for that particular chunk and pass that to is_total_slice instead of self._chunks (the loop that calls _process_chunk knows the chunk key)
An example profile in which 18 chunks take the fast path, and 9 chunks are uselessly copied in the last line.
Line # Hits Time Per Hit % Time Line Contents
==============================================================
2013 def _process_chunk(
2014 self,
2015 out,
2016 cdata,
2017 chunk_selection,
2018 drop_axes,
2019 out_is_ndarray,
2020 fields,
2021 out_selection,
2022 partial_read_decode=False,
2023 ):
2024 """Take binary data from storage and fill output array"""
2025 126 25000.0 198.4 0.0 if (
2026 27 58000.0 2148.1 0.0 out_is_ndarray
2027 27 6000.0 222.2 0.0 and not fields
2028 27 572000.0 21185.2 0.2 and is_contiguous_selection(out_selection)
2029 27 244000.0 9037.0 0.1 and is_total_slice(chunk_selection, self._chunks)
2030 18 8000.0 444.4 0.0 and not self._filters
2031 18 41000.0 2277.8 0.0 and self._dtype != object
2032 ):
2033 18 37000.0 2055.6 0.0 dest = out[out_selection]
2034 # Assume that array-like objects that doesn't have a
2035 # `writeable` flag is writable.
2036 18 16000.0 888.9 0.0 dest_is_writable = getattr(dest, "writeable", True)
2037 36 8000.0 222.2 0.0 write_direct = dest_is_writable and (
2038 18 24000.0 1333.3 0.0 (self._order == "C" and dest.flags.c_contiguous)
2039 or (self._order == "F" and dest.flags.f_contiguous)
2040 )
2041 18 4000.0 222.2 0.0 if write_direct:
2042 # optimization: we want the whole chunk, and the destination is
2043 # contiguous, so we can decompress directly from the chunk
2044 # into the destination array
2045 18 11000.0 611.1 0.0 if self._compressor:
2046 18 7000.0 388.9 0.0 if isinstance(cdata, PartialReadBuffer):
2047 cdata = cdata.read_full()
2048 18 105409000.0 6e+06 42.5 self._compressor.decode(cdata, dest)
2049 else:
2050 if isinstance(cdata, UncompressedPartialReadBufferV3):
2051 cdata = cdata.read_full()
2052 chunk = ensure_ndarray_like(cdata).view(self._dtype)
2053 chunk = chunk.reshape(self._chunks, order=self._order)
2054 np.copyto(dest, chunk)
2055 18 11000.0 611.1 0.0 return
2056
....
2093 9 80870000.0 9e+06 32.6 chunk = self._decode_chunk(cdata)
2094
2095 # select data from chunk
2096 9 4000.0 444.4 0.0 if fields:
2097 chunk = chunk[fields]
2098 9 27000.0 3000.0 0.0 tmp = chunk[chunk_selection]
2099 9 1000.0 111.1 0.0 if drop_axes:
2100 tmp = np.squeeze(tmp, axis=drop_axes)
2101
2102 # store selected data in output
2103 # import ipdb; ipdb.set_trace()
2104 9 60815000.0 7e+06 24.5 out[out_selection] = tmp
I filed a related issue recently as well
https://github.com/zarr-developers/zarr-python/issues/1730