Using blockcache with local breaks, because the `LocalFileOpener.closed` property doesn't have a setter
Trying to use the blockcache filesystem with a local filesystem as a target raises an error if you try to close a file manually. Here's a reproduction:
from tempfile import TemporaryDirectory
from fsspec import AbstractFileSystem, filesystem
local: AbstractFileSystem = filesystem("local")
f = local.open("file.txt", mode="w")
f.write("some data\n")
f.close()
with TemporaryDirectory(prefix="cache", dir=".", ignore_cleanup_errors=True) as cache_dir:
cache: AbstractFileSystem = filesystem(
"blockcache", fs=local, cache_storage=cache_dir
)
f = cache.open("file.txt")
print(f.read())
f.close()
The last .close() raises an error:
Traceback (most recent call last):
File "C:\Users\Antonio.Dourado\workspace\project\fsspec-bug.py", line 16, in <module>
f.close()
File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\implementations\cached.py", line 365, in <lambda>
f.close = lambda: self.close_and_update(f, close)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\implementations\cached.py", line 436, in <lambda>
return lambda *args, **kw: getattr(type(self), item).__get__(self)(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\implementations\cached.py", line 392, in close_and_update
f.closed = True
^^^^^^^^
AttributeError: property 'closed' of 'LocalFileOpener' object has no setter
AbstractBufferedFile does define closed as a writable property, so, maybe LocalFileOpener should too?
In case anyone wonders, or if this looks dumber than I think, I'm using blockcache + local for testing. My plan is to use blockcache in my production code, taking an arbitrary filesystem as a parameter. Then in production I use abfs, and for testing, local.
Also, not sure if this can be in here, or maybe should be a different issue, but throwing dirfs in the mix breaks things even earlier. It looks like the reason is because DirFileSystem just uses AbstractBufferedFile directly, but AbstractBufferedFile doesn't implement _fetch_range, and blockcache will call this for reading. Here's a(nother) reproduction:
from tempfile import TemporaryDirectory
from fsspec import AbstractFileSystem, filesystem
with TemporaryDirectory(
prefix="content-", dir=".", ignore_cleanup_errors=True
) as tmp_dir:
local: AbstractFileSystem = filesystem("local")
tmp = filesystem("dir", path=tmp_dir, fs=local)
f = tmp.open("file.txt", mode="w")
f.write("some data\n")
f.close()
with TemporaryDirectory(
prefix="cache-", dir=".", ignore_cleanup_errors=True
) as cache_dir:
cache: AbstractFileSystem = filesystem(
"blockcache", fs=tmp, cache_storage=cache_dir
)
f = cache.open("file.txt")
print(f.read())
f.close()
And the output:
Traceback (most recent call last):
File "C:\Users\Antonio.Dourado\workspace\project\fsspec-bug.py", line 21, in <module>
print(f.read())
^^^^^^^^
File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\spec.py", line 1844, in read
out = self.cache._fetch(self.loc, self.loc + length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\caching.py", line 136, in _fetch
self.cache[sstart:send] = self.fetcher(sstart, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Antonio.Dourado\workspace\project\.venv\Lib\site-packages\fsspec\spec.py", line 1822, in _fetch_range
raise NotImplementedError
NotImplementedError
Environment info:
- Python: 3.11.6
- fsspec: 2023.12.1
- OS: Windows 10 Enteprise, x64
Since BLockCache hooks into files at a low level during .read(), it only works with classes deriving from AbstractBufferedFile - and this does not include local files. So .read() might work, but you're not actually using the cache at that point.
@martindurant hmm, I only tested the actual caching with abfs.
I tried local + blockcache first with polars + pyarrow, but pyarrow ignores the exception and just prints an error message, which lead me down the rabbit hole.
So, if that's not exactly something to fix, maybe the error could be improved? Would be nicer to get an error right out of the bat when trying to wrap an unsupported filesystem with blockcache.
I agree that blockcache given a non-compatible filesystem target should error, but I'm not immediately sure how to implement this. There is even the possibility (I think HTTP only?) that the file returned by open() depends on the arguments and maybe the server in question.
Next best thing would be docs, then? Or a warning for known incompatible filesystems?
Into the docstring of BlockCache? That would be fine.