filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Zip FS does not implement cat() right

Open forman opened this issue 4 years ago • 7 comments

The AbstractFileSystem states for the cat() operation:

    def cat(self, path, recursive=False, on_error="raise", **kwargs):
        """Fetch (potentially multiple) paths' contents

        Parameters
        ----------
        recursive: bool
            If True, assume the path(s) are directories, and get all the contained files

       ...

But ZipFileSystem.cat() just delegates its argument zip.ZipFile.read(), although path may be a sequence (which makes perfect sense for cat = concatenate):

See https://github.com/intake/filesystem_spec/blob/master/fsspec/implementations/zip.py#L84-L85

I came across the issue using a compound URL for opening a Zarr from a Zip archive in S3:

import fsspec
import xarray as xr

store = fsspec.get_mapper('zip::s3://agriculture-vlab-data-test/S1_L2_COH_VH_31UDR.zarr.zip', s3=dict(anon=True))
ds = xr.open_zarr(store)

Which yields to

...

D:\Miniconda3\envs\xcube\lib\site-packages\zarr\core.py in _chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
   1834         else:
   1835             partial_read_decode = False
-> 1836             cdatas = self.chunk_store.getitems(ckeys, on_error="omit")
   1837         for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
   1838             if ckey in cdatas:

D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\mapping.py in getitems(self, keys, on_error)
     88         oe = on_error if on_error == "raise" else "return"
     89         try:
---> 90             out = self.fs.cat(keys2, on_error=oe)
     91             if isinstance(out, bytes):
     92                 out = {keys2[0]: out}

D:\Miniconda3\envs\xcube\lib\site-packages\fsspec\implementations\zip.py in cat(self, path, callback, **kwargs)
     83 
     84     def cat(self, path, callback=None, **kwargs):
---> 85         return self.zip.read(path)
     86 
     87     def _open(

D:\Miniconda3\envs\xcube\lib\zipfile.py in read(self, name, pwd)
   1461     def read(self, name, pwd=None):
   1462         """Return file bytes for name."""
-> 1463         with self.open(name, "r", pwd) as fp:
   1464             return fp.read()
   1465 

D:\Miniconda3\envs\xcube\lib\zipfile.py in open(self, name, mode, pwd, force_zip64)
   1500         else:
   1501             # Get info object for name
-> 1502             zinfo = self.getinfo(name)
   1503 
   1504         if mode == 'w':

D:\Miniconda3\envs\xcube\lib\zipfile.py in getinfo(self, name)
   1425     def getinfo(self, name):
   1426         """Return the instance of ZipInfo given 'name'."""
-> 1427         info = self.NameToInfo.get(name)
   1428         if info is None:
   1429             raise KeyError(

TypeError: unhashable type: 'list'

forman avatar Oct 06 '21 14:10 forman

It looks like renaming cat->cat_file should solve this

martindurant avatar Oct 06 '21 14:10 martindurant

(perhaps you wouldn't mind trying that and submitting a PR!)

martindurant avatar Oct 06 '21 14:10 martindurant

Sure :)

forman avatar Oct 06 '21 14:10 forman

Ok, I just did a few tests. There are many other places where ZipFileSystem does not handle correctly the situtation where a path is a list of paths. However, I try making this work, because it is important for our project. Hopefully I have a PR by tomorrow. Thanks @martindurant

forman avatar Oct 06 '21 14:10 forman

Looking again, I wouldn't be surprised if simply removing cat would lead to the right behaviour (the superclass cat calls open). That's how it's done int he libarchive implementation.

martindurant avatar Oct 06 '21 18:10 martindurant

@martindurant you were right. That fixed it, the PR is #789.

forman avatar Oct 07 '21 09:10 forman

My original use case now works:

image

forman avatar Oct 07 '21 09:10 forman

Closing this as it was fixed by #789.

ianthomas23 avatar Sep 21 '23 12:09 ianthomas23