filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Using blockcache with local breaks, because the `LocalFileOpener.closed` property doesn't have a setter

Open tonnydourado opened this issue 2 years ago • 5 comments

Trying to use the blockcache filesystem with a local filesystem as a target raises an error if you try to close a file manually. Here's a reproduction:

from tempfile import TemporaryDirectory

from fsspec import AbstractFileSystem, filesystem

local: AbstractFileSystem = filesystem("local")
f = local.open("file.txt", mode="w")
f.write("some data\n")
f.close()

with TemporaryDirectory(prefix="cache", dir=".", ignore_cleanup_errors=True) as cache_dir:
    cache: AbstractFileSystem = filesystem(
        "blockcache", fs=local, cache_storage=cache_dir
    )
    f = cache.open("file.txt")
    print(f.read())
    f.close()

The last .close() raises an error:

Traceback (most recent call last):
  File "C:\Users\Antonio.Dourado\workspace\project\fsspec-bug.py", line 16, in <module>
    f.close()
  File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\implementations\cached.py", line 365, in <lambda>
    f.close = lambda: self.close_and_update(f, close)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\implementations\cached.py", line 436, in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\implementations\cached.py", line 392, in close_and_update
    f.closed = True
    ^^^^^^^^
AttributeError: property 'closed' of 'LocalFileOpener' object has no setter

AbstractBufferedFile does define closed as a writable property, so, maybe LocalFileOpener should too?

In case anyone wonders, or if this looks dumber than I think, I'm using blockcache + local for testing. My plan is to use blockcache in my production code, taking an arbitrary filesystem as a parameter. Then in production I use abfs, and for testing, local.

Also, not sure if this can be in here, or maybe should be a different issue, but throwing dirfs in the mix breaks things even earlier. It looks like the reason is because DirFileSystem just uses AbstractBufferedFile directly, but AbstractBufferedFile doesn't implement _fetch_range, and blockcache will call this for reading. Here's a(nother) reproduction:

from tempfile import TemporaryDirectory

from fsspec import AbstractFileSystem, filesystem

with TemporaryDirectory(
    prefix="content-", dir=".", ignore_cleanup_errors=True
) as tmp_dir:
    local: AbstractFileSystem = filesystem("local")
    tmp = filesystem("dir", path=tmp_dir, fs=local)
    f = tmp.open("file.txt", mode="w")
    f.write("some data\n")
    f.close()

    with TemporaryDirectory(
        prefix="cache-", dir=".", ignore_cleanup_errors=True
    ) as cache_dir:
        cache: AbstractFileSystem = filesystem(
            "blockcache", fs=tmp, cache_storage=cache_dir
        )
        f = cache.open("file.txt")
        print(f.read())
        f.close()

And the output:

Traceback (most recent call last):
  File "C:\Users\Antonio.Dourado\workspace\project\fsspec-bug.py", line 21, in <module>
    print(f.read())
          ^^^^^^^^
  File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\spec.py", line 1844, in read
    out = self.cache._fetch(self.loc, self.loc + length)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\caching.py", line 136, in _fetch
    self.cache[sstart:send] = self.fetcher(sstart, send)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\spec.py", line 1822, in _fetch_range
    raise NotImplementedError
NotImplementedError

Environment info:

  • Python: 3.11.6
  • fsspec: 2023.12.1
  • OS: Windows 10 Enteprise, x64

tonnydourado avatar Dec 09 '23 21:12 tonnydourado

Since BLockCache hooks into files at a low level during .read(), it only works with classes deriving from AbstractBufferedFile - and this does not include local files. So .read() might work, but you're not actually using the cache at that point.

martindurant avatar Dec 11 '23 16:12 martindurant

@martindurant hmm, I only tested the actual caching with abfs.

I tried local + blockcache first with polars + pyarrow, but pyarrow ignores the exception and just prints an error message, which lead me down the rabbit hole.

So, if that's not exactly something to fix, maybe the error could be improved? Would be nicer to get an error right out of the bat when trying to wrap an unsupported filesystem with blockcache.

tonnydourado avatar Dec 11 '23 18:12 tonnydourado

I agree that blockcache given a non-compatible filesystem target should error, but I'm not immediately sure how to implement this. There is even the possibility (I think HTTP only?) that the file returned by open() depends on the arguments and maybe the server in question.

martindurant avatar Dec 11 '23 20:12 martindurant

Next best thing would be docs, then? Or a warning for known incompatible filesystems?

tonnydourado avatar Dec 11 '23 21:12 tonnydourado

Into the docstring of BlockCache? That would be fine.

martindurant avatar Dec 11 '23 21:12 martindurant