scijava-common
scijava-common copied to clipboard
SciJava Location API Improvements
During a conversation with @ctrueden we created the following plan for further improvements of the scijava-location API:
Abstraction structure:
- Location: Contains the metadata, where a file is located, what credentials are needed for access, etc.
- Session: Provides access to data sources on a host.
- DataHandle: Provides access to the bytes in a data source, is used e.g. by scifio formats to read images.
SessionService
- can create a Session for a remote, takes Location as input
- "remote" = the host/non-path part of a location URI)
- caches Sessions for each remote
- whenever a Session is handed out, increment a usage ref
- whenever a DataHandle is closed, it decrements its session's usage ref
- if applicable -- some DataHandles don't need Sessions
Protocols
- List of protocols we want:
- [ ] SSH SCP
- [ ] (S)FTP
- [ ] HTTP/HTTPS:
- with resume, to avoid excessive rereading
- support PUT for uploads?
- [ ] OMERO
- [ ] HDFS
- [ ] Cloud block storage:
- [ ] Amazon S3
- [ ] Openstack Swift
- [ ] Azure Block Storage
- [ ] Google Cloud Storage
Service/plugin architecture
- Session will need to be a new Plugin type
- SessionService will be a HandlerService
- Model it after DataHandleService; it is very similar but with additional API, obviously -- e.g., fetchOrCreate or some such
- Update the DataHandleService to have a new method:
-
create(L extends Location, Session<L>) - This method allows reusing a specific session.
- This session may not be closed when the handle is closed
-
- The existing DataHandleService method
create(Location)will simply ask the SessionService tofetchOrCreatea session for that Location. - Naively, it might seem like we need a "Remote" interface or some such, but I actually think we won't need it. I think each Session will be able to extract the information it needs from its associated type of Location, and that will be good enough, and simpler.
- Finish the StreamHandle interface(?) for DataHandles that are built on InputStream/OutputStreams.
- Some location types like URLHandle would probably(?) benefit from extending StreamHandle
- But if none of the protocols would actually benefit from extending StreamHandle, we could decide not to do it.
Protocol specific notes
- Native HTTP / HTTPS support in Java seems to lack support for resume, looks like we need an external library for that.
- We need to make sure that we only cache sessions that support concurrent access. -> Need method
public boolean isConcurent()in interface. - Sessions might only allow for limited concurrent access, if we encounter this we will need a
public int concurentAccessLimit()method. - When do we close automatically created sessions?
- Need a cache eviction strategy like LRU
- Don't immediately close Session when reference count reaches 0
- Maybe use a connection pool with a modifiable limit?
Implementation specific notes
- calling
dispose()on the SessionService will close all connections. - Currently working on this on the
What is the current state of this? Can it be closed, since #259 is merged?
I think this issue is largely, but not completely, tackled. In particular, we still lack this SessionService. Leaving open for now.