scijava-common icon indicating copy to clipboard operation
scijava-common copied to clipboard

SciJava Location API Improvements

Open gab1one opened this issue 9 years ago • 2 comments

During a conversation with @ctrueden we created the following plan for further improvements of the scijava-location API:

Abstraction structure:

  1. Location: Contains the metadata, where a file is located, what credentials are needed for access, etc.
  2. Session: Provides access to data sources on a host.
  3. DataHandle: Provides access to the bytes in a data source, is used e.g. by scifio formats to read images.

SessionService

  • can create a Session for a remote, takes Location as input
    • "remote" = the host/non-path part of a location URI)
  • caches Sessions for each remote
  • whenever a Session is handed out, increment a usage ref
  • whenever a DataHandle is closed, it decrements its session's usage ref
    • if applicable -- some DataHandles don't need Sessions

Protocols

  • List of protocols we want:
    • [ ] SSH SCP
    • [ ] (S)FTP
    • [ ] HTTP/HTTPS:
      • with resume, to avoid excessive rereading
      • support PUT for uploads?
    • [ ] OMERO
    • [ ] HDFS
    • [ ] Cloud block storage:
      • [ ] Amazon S3
      • [ ] Openstack Swift
      • [ ] Azure Block Storage
      • [ ] Google Cloud Storage

Service/plugin architecture

  • Session will need to be a new Plugin type
  • SessionService will be a HandlerService
    • Model it after DataHandleService; it is very similar but with additional API, obviously -- e.g., fetchOrCreate or some such
  • Update the DataHandleService to have a new method:
    • create(L extends Location, Session<L>)
    • This method allows reusing a specific session.
    • This session may not be closed when the handle is closed
  • The existing DataHandleService method create(Location) will simply ask the SessionService to fetchOrCreate a session for that Location.
  • Naively, it might seem like we need a "Remote" interface or some such, but I actually think we won't need it. I think each Session will be able to extract the information it needs from its associated type of Location, and that will be good enough, and simpler.
  • Finish the StreamHandle interface(?) for DataHandles that are built on InputStream/OutputStreams.
    • Some location types like URLHandle would probably(?) benefit from extending StreamHandle
    • But if none of the protocols would actually benefit from extending StreamHandle, we could decide not to do it.

Protocol specific notes

  • Native HTTP / HTTPS support in Java seems to lack support for resume, looks like we need an external library for that.
  • We need to make sure that we only cache sessions that support concurrent access. -> Need method public boolean isConcurent() in interface.
  • Sessions might only allow for limited concurrent access, if we encounter this we will need a public int concurentAccessLimit() method.
  • When do we close automatically created sessions?
    • Need a cache eviction strategy like LRU
    • Don't immediately close Session when reference count reaches 0
    • Maybe use a connection pool with a modifiable limit?

Implementation specific notes

  • calling dispose() on the SessionService will close all connections.
  • Currently working on this on the more-handles-gabriel branch

gab1one avatar Dec 15 '16 12:12 gab1one

What is the current state of this? Can it be closed, since #259 is merged?

imagejan avatar Oct 03 '17 10:10 imagejan

I think this issue is largely, but not completely, tackled. In particular, we still lack this SessionService. Leaving open for now.

ctrueden avatar Oct 04 '17 16:10 ctrueden