filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Publicly expose/document API for retrieving cache file details

Open jwodder opened this issue 4 years ago • 1 comments

In an application that uses CachingFileSystem, we are interested in implementing partial cache cleanup based on cached file size & age. While fsspec already has the necessary pieces to accomplish this, such code would depend on currently-undocumented implementation details regarding the structure of cache metadata, which would not be a wise thing to do. We thus request the addition of some public method to CachingFileSystem for listing cached paths and the files that cache them.

jwodder avatar Dec 14 '21 21:12 jwodder

I support this, but I think we can also do a better job of cleaning up the expectations of the caching framework in general. For example, we can split apart:

  • how do we turn target path names into cache path names (currently we have just the one option, to hash paths or keep the basename)
  • how parts of a file are stored (this is the one piece we currently implement as separate classes)
  • how any necessary metadata is stored (single JSON file? sidecar files? same place as cache or elsewhere?)
  • how consistency and liveness of cached data is determined (we have a couple of options)

I am hoping that, at the very minimum, we can, for example, enable caching to backends other than the local filesystem, so as a "local" S3 bucket in the same data centre as the process. But then we cannot assume that we have direct or up-to-date access to the cache metadata and we have to face the other problems.

martindurant avatar Dec 14 '21 21:12 martindurant