Publicly expose/document API for retrieving cache file details
In an application that uses CachingFileSystem, we are interested in implementing partial cache cleanup based on cached file size & age. While fsspec already has the necessary pieces to accomplish this, such code would depend on currently-undocumented implementation details regarding the structure of cache metadata, which would not be a wise thing to do. We thus request the addition of some public method to CachingFileSystem for listing cached paths and the files that cache them.
I support this, but I think we can also do a better job of cleaning up the expectations of the caching framework in general. For example, we can split apart:
- how do we turn target path names into cache path names (currently we have just the one option, to hash paths or keep the basename)
- how parts of a file are stored (this is the one piece we currently implement as separate classes)
- how any necessary metadata is stored (single JSON file? sidecar files? same place as cache or elsewhere?)
- how consistency and liveness of cached data is determined (we have a couple of options)
I am hoping that, at the very minimum, we can, for example, enable caching to backends other than the local filesystem, so as a "local" S3 bucket in the same data centre as the process. But then we cannot assume that we have direct or up-to-date access to the cache metadata and we have to face the other problems.