Mike Gerber

Results 47 issues of Mike Gerber

page__text.xsl is not honoring the reading order in the PAGE-XML (`pc:ReadingOrder`), which gives completely false results. For [this page](https://qurator-data.de/examples/actevedef_718448162.first-page.zip), I get this text (shortened): ~~~ % docker run --rm -it...

bug
enhancement

While the files in the top directory seem to come from the sources in the [langdata](https://github.com/tesseract-ocr/langdata) repository, the source for some of the files in `scripts/` is unclear: * `scripts/Fraktur.traineddata`...

With this document ([PPN894261851.zip](https://github.com/qurator-spk/eynollah/files/8068764/PPN894261851.zip)) we experienced an OOM error. Further investigation revealed this memory usage (measured using procpath): ![eynollah vs Buchrücken drawio](https://user-images.githubusercontent.com/34309482/154056914-04038201-9560-4e9d-8489-adc8418f0432.png) The culprit seems to be this "page" from...

bug

Example: https://circleci.com/api/v1.1/project/github/OCR-D/ocrd_calamari/177/output/106/0?file=true&allocation-id=62165a9241d4334ebb050ee2-0-build%2F1CB8E496 Excerpt: ~~~ ocrd resmgr download ocrd-calamari-recognize qurator-gt4histocr-1.0 16:06:28.067 INFO ocrd.cli.resmgr - Downloading resource {'url': 'https://qurator-data.de/calamari-models/GT4HistOCR/2019-12-11T11_10+0100/model.tar.xz', 'type': 'tarball', 'name': 'qurator-gt4histocr-1.0', 'description': 'Calamari model trained with GT4HistOCR', 'size': 90275264, 'path_in_archive':...

Currently `ocrd workspace find --download` is the way to download the files of a workspace. I propose aliasing this to an - arguably - more user-friendly `ocrd workspace download` command.

enhancement

Another idea that came up in https://github.com/OCR-D/ocrd_olena/issues/60: I routinely run validation after running each processor to catch problems early. If there was a standard option `--validate` in core (supplemented by...

enhancement

In #68 @bertsky : > But the real problem is that TF2 dependencies are lurking everywhere, so we will very soon have the unacceptable state that no catch-all venv (satisfying...

enhancement

ocrd-segment-repair has the optional operations "plausibilize" and "sanitize" – I have no idea what this exactly does :) I would prefer something like this: * shrink-regions-to-hull-of-lines * whatever-plausibilize-does There seems...

Apparently there is still a namespace problem when installing e.g. eynollah and sbb_binarization together and I think it should be resolved by not using "qurator" as part of the namespace...

enhancement

https://github.com/qurator-spk/eynollah/commit/b049e475e34b49536527f872abcf765f4010a6d7 fixed the model download, but is not in the release. With 0.3.0 you can't download the model using `ocrd resmgr download ocrd-eynollah-segment default`: ``` 15:44:30.208 INFO ocrd.resource_manager.download - Copying...

ocr-d