kitodo-presentation icon indicating copy to clipboard operation
kitodo-presentation copied to clipboard

Kitodo.Presentation dynamic identifier resolving

Open claussni opened this issue 7 years ago • 6 comments

Problem

Kitodo.Presentation can only resolve two types of identifiers to obtain METS/MODS documents: internal TYPO3 record PIDs and URLs. URLs have to be properly encoded. This is not obvious to everybody, as browsers automatically check for proper URL encoding to produce valid URLs without bothering users. However, the given URL must still be a valid URL after TYPO3 decoded the query parameters. This essentially requires double URL encoding in most cases, which sometimes fails in the context of RealURL and naive URL handling.

Example

  • Identifier urn:foo:bar-4711 URL-encoded: urn%3Afoo%3Abar-4711
  • Location of the resource: http://service.com/files/urn%3Afoo%3Abar-4711
  • ID parameter: http%3A%2F%2Fservice.com%2Ffiles%2Furn%253Afoo%253Abar-4711

Proposed Solution

Kitodo.Publication should support pluggable identifier resolving. There might be a chain of configured resolvers for a variety of identifiers. For example to resolve internal PIDs, URLs, URNs, DOIs or custom URIs for local systems. One could also think of resolvers that allow dynamic rewriting of URLs using pattern matching. The configuration would include a priority for every registered resolver. Resolvers could be installed using TYPO3 extensions.

claussni avatar Jun 21 '18 14:06 claussni

I don't understand why having different resolvers would solve the "problem" of having to properly encode URL parameters. Even if you use DOIs, URNs or other kinds of identifiers you would still have to encode the tx_dlf[id] URL parameter properly.

Anyways, I can get behind the idea of supporting pluggable identifier resolving. We already have this for metadata and fulltext formats and could quite easily adapt it for identifier resolving.

sebastian-meyer avatar Jul 26 '18 14:07 sebastian-meyer

You are right. On the first look this issue is some kind of a mixed bag. Problem no.1 being double encoding and problem no.2 identifier resolving. I think I meant that a URI resolver could have a feature of making double encoding unnecessary.

claussni avatar Jul 26 '18 15:07 claussni

Another problem with URLs as identifiers popped up: The given URL has to be resolvable by the TYPO3 system running the extension. In case of Docker containers with port mapping or other complex networking setups this is not always possible.

claussni avatar Apr 26 '19 06:04 claussni

I think there is a misconception about the usage of URLs and identifiers in Kitodo.Presentation. For every document, Kitodo has a location field, which holds an URI for the physical location of the METS file, and a record identifier field, which holds an unique identifier for the document itself. Only the first one has to be resolvable (for obvious reasons), while the latter doesn't have to be resolvable (in fact, it could just be any numeral or string). You can address a document both ways: by providing the (properly encoded) location URI or by providing the record identifier. However, the latter requires to have the document indexed first in order to make the record identifier known to Kitodo. (This is not the case for Qucosa documents, since they are not indexed, but only addressed by their location.)

sebastian-meyer avatar Apr 26 '19 07:04 sebastian-meyer

So there is a location URL parameter? The problem is that this URL needs to be resolvable from within the TYPO3 runtime. This is not necessarily the case in Docker environments.

claussni avatar Apr 26 '19 12:04 claussni

location isn't a separate URL parameter. The parameter tx_dlf[id] can be set either to the location URL or the document's record identifier.

Kitodo.Presentation needs to access the METS file in order to process the information it needs to properly present the document. METS files are addressed with an URI that has to be resolvable (but can be a local file://localhost/path URI). Thus running Kitodo.Presentation within a Docker environment requires either 'mounting' the METS files into the container or making the http URIs resolvable from within the container.

sebastian-meyer avatar Apr 26 '19 12:04 sebastian-meyer