datoteka icon indicating copy to clipboard operation
datoteka copied to clipboard

Support java.nio.file.FileSystem abstraction

Open RickMoynihan opened this issue 3 years ago • 3 comments

Hi,

Firstly thanks for datoteka I've been using it across several projects for several years.

The main limitation I have encountered in using it however is that it only supports creation of paths on the default filesystem, and doesn't support constructing paths in other java FileSystems; in particular zip files.

In a few projects I've had the need to operate on paths and perform IO reading and writing files in and out of zip files; and consequently I had to work around some of these limitations and develop some custom code for interop with java's ZipFileSystem. Java ships with a ZipFileSystemProvider, which can be used for this purpose.

I was thinking it would be nice to contribute and integrate some of this code into datoteka or as a companion library. Is it something you'd be interested in?

I'm not entirely sure what would be required to maintain backwards compatibility with datoteka yet. However I believe the main change is simply that I added a new path constructor which takes a FileSystem as the first argument (or if no filesystem is provided here it assumes the default filesystem), and it then returns paths in that filesystem. I believe these paths can then work with most of the datoteka functions as they stand; though some may want to also take an optional filesystem argument.

If you're interested what do you think is the best way to proceed? I was thinking I might first extract the code into a separate library, and share it here for discussion on how best to integrate, before working up a PR to attempt it?

There is some additional code I wrote to also handle reading zip files and managing their resources from multiple threads; which is necessary because the underlying java code doesn't do this very well. I don't know whether that belongs here or not yet; but it is very useful if you need to for example serve files out of zip files over http and may have multiple requests accessing the same file at the same time.

Anyway let me know if this is something worth trying to contribute?

Many thanks.

RickMoynihan avatar Feb 14 '22 10:02 RickMoynihan

Hi,

Firstly thanks for datoteka I've been using it across several projects for several years.

\o/ thanks, and glad to know it.

The main limitation I have encountered in using it however is that it only supports creation of paths on the default filesystem, and doesn't support constructing paths in other java FileSystems; in particular zip files.

In a few projects I've had the need to operate on paths and perform IO reading and writing files in and out of zip files; and consequently I had to work around some of these limitations and develop some custom code for interop with java's ZipFileSystem. Java ships with a ZipFileSystemProvider, which can be used for this purpose.

I was thinking it would be nice to contribute and integrate some of this code into datoteka or as a companion library. Is it something you'd be interested in?

I'm not entirely sure what would be required to maintain backwards compatibility with datoteka yet. However I believe the main change is simply that I added a new path constructor which takes a FileSystem as the first argument (or if no filesystem is provided here it assumes the default filesystem), and it then returns paths in that filesystem. I believe these paths can then work with most of the datoteka functions as they stand; though some may want to also take an optional filesystem argument.

If you're interested what do you think is the best way to proceed? I was thinking I might first extract the code into a separate library, and share it here for discussion on how best to integrate, before working up a PR to attempt it?

I think it is worth to implement. From my perspective, it depends on how you feel comfortable. For me is ok, with a "draft" PR with quick and dirty changes to show the approximate changes, when we can discuss the particularities. But it also ok for separate repo, or whatever is ok for you and make you spend less time. As I can deduce reading this issue, the changes are mainly have the ability to pass the filesystem in some functions. That looks reasonable and probably will maintain the backward compatibility without any problem. We can look over the code case by case :D

There is some additional code I wrote to also handle reading zip files and managing their resources from multiple threads; which is necessary because the underlying java code doesn't do this very well. I don't know whether that belongs here or not yet; but it is very useful if you need to for example serve files out of zip files over http and may have multiple requests accessing the same file at the same time.

About this, I'm not clearly sure. It may fit in datoteka and may not. In any case it will probably live in a separated namespace than core. Show me examples with that i can reason about and I give you my opinion. Honestly, I hadn't thought about this until now.

Anyway let me know if this is something worth trying to contribute?

Many thanks.

niwinz avatar Feb 14 '22 10:02 niwinz

Thanks for the quick response!

As I can deduce reading this issue, the changes are mainly have the ability to pass the filesystem in some functions. That looks reasonable and probably will maintain the backward compatibility without any problem. We can look over the code case by case :D

For now I've just extracted the relevant code into a gist here:

https://gist.github.com/RickMoynihan/af3049c7214077ab8eb1a352e45946e0

It was largely just written to get something working; rather than with integration with datoteka in mind. So it currently defines its own paths function which takes a filesystem as its first argument and returns paths within that filesystem. There are also some functions open-zip-fs, create-zip-fs and make-zip-file which can be used to either create/open a zip file system accordingly, or to write files into a zip according to a desired layout.

The main difference (providing the filesystem for context) is here:

https://gist.github.com/RickMoynihan/af3049c7214077ab8eb1a352e45946e0#file-zip_file-clj-L41-L74

I'll have a look at what needs done to extend datoteka without breaking compatibility.

RickMoynihan avatar Feb 14 '22 13:02 RickMoynihan

After reading the gist file, I can confirm that there are many things that can be ported to datoteka with small changes.

Aso, I think the subsystem with locking and reference counting should live outside of datoteka. Looks nice but I think is not "general purpose". And the zip-* related functions for creating the filesystem can be more generic, because on the end they are calling newFileSistem that is generic and not related to zip.

I will need more in deepth looking on it but this is a good start point. If it is ok for you, I will prepare a PR myself taking the gist as base and we review on it.

niwinz avatar Feb 15 '22 10:02 niwinz