kitodo-presentation icon indicating copy to clipboard operation
kitodo-presentation copied to clipboard

Make the use of XPATH 2.0 expressions & functions possible

Open michaelkubina opened this issue 3 years ago • 2 comments

Description

Kitodo.Presentation should allow for the usage of XPATH 2.0 expressions and functions.

Problem

In its current state Kitodo.Presentation ist limited to XPATH 1.0 expressions & functions, because the underlying SimpleXMLElement or DOMXpath objects are getting parsed through libxml2 library (https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home), which does and will not support XPATH 2.0 (https://mail.gnome.org/archives/xml/2007-February/msg00077.html). This limits the possibilities how to extract and handle metadata or requires some kind of helper functions for certain cases, which would be even more limiting.

Proposed Solution

The author of libxml2 pointed at

EXSLT extensions which are supported by libxslt and xsltproc

But those seem to be in a unclear support and development state, as the submissions and downloadpages result in 404 pages.

I also came across a PHP library for XPATH 2.0 expressions, but it looks pretty unknown to the public. It looks like its very extensive and tested against the conformance suite and it can be integrated via composer: https://github.com/bseddon/XPath20

Another (and in my opinion the most favorable) possibility would be to make use of a XSLT-Processor, which would allow for full XPATH 2.0(+) support. But i cant anticipate how well it would perform. Saxon-C with its PHP-API would be capable for this task but has certain limitations, as the processor does output only in english if there are functions that could specify output in other locales (e.g. date-format()).

Mockups and Examples

Target Version

It would already help in DLF4 tremendously, but otherwise in the next or one of the next major release(s) (with the help of the development fund e.g.).

Additional Context

For our upcoming newspaper portal we would liked to output the "dateIssued", which is ISO 8601 encoded, in a more natural language.

format-date(
./mods:originInfo[@eventType='publication']/mods:dateIssued[@encoding="iso8601"], 
"[FNn], den [D]. [MNn] [Y]", "de", (), ())

would output

"Samstag, den 26. Oktober 1844"

This would also allow to make facets for the day of the week or the month, if metadata is created with a proper expression for this purpose (derived from the above example).

michaelkubina avatar Jun 27 '22 14:06 michaelkubina

While being able to use XPath 2.0 expressions and functions would be great, for your particular example, I'd recommend using Typoscript. Thereby you could even check for the client's chosen website language and format the dates accordingly.

sebastian-meyer avatar Oct 25 '22 12:10 sebastian-meyer

Thank you for directing me to it! I agree, that Typoscript would be a way in this case, if it was for displaying purposes only. For facets i could also workaround and transform the METS with saxon and adding more fields, to get the day of the week for example. But as i said, this would be a workaround and bloat the METS.

There are also other use cases, like sideloading metadata from related Documents or using better functions like "string-join" instead of concat, when handling multivalued fields. Overall XPATH 1.0 is a limitation.

Do you believe it would be something, thats a) even possible to change and b) if so possible and reasonable to do with the development funds?

PS: I have tested it with Typoscript, and while it worked within the metadata plugin, it failed in the listview-plugin, because the value does not get processed there. From what i see, those values get passed directly from the solr to the view.

michaelkubina avatar Nov 14 '22 14:11 michaelkubina