kitodo-presentation icon indicating copy to clipboard operation
kitodo-presentation copied to clipboard

Fix slub/dfgviewer#147 in ALTO parser

Open sebastian-meyer opened this issue 6 years ago • 2 comments

We have fixed https://github.com/slub/dfg-viewer/issues/147 rather quick & dirty. A better solution would involve fixing the issue directly in the ALTO parser of Kitodo.Presentation.

sebastian-meyer avatar Jan 13 '20 15:01 sebastian-meyer

Your fix now allows to render text that has (HTML-encoded) newlines in it as well, but no SP (or not even multiple distinct TextLine elements). See here for an example. (This ALTO was produced by page-to-alto converter with --alto-version 2.0 --dummy-textline --dummy-word in effect.)

It would be great if that workaround would still work in the future (because full texts without true/correct textline and word segmentation are a valid use case).

But it also shows that it is important for readibility that at least some newlines appear / get rendered. In my example, newlines are already included in the string. But Presentation should also insert them between successive TextLines.

bertsky avatar Jun 24 '21 13:06 bertsky

BTW, the ALTO download then removes the HTML-encoded newline characters – too bad!

bertsky avatar Feb 17 '23 16:02 bertsky