Kay-Michael Würzner
Kay-Michael Würzner
... based on the test data. Maybe even with the new GitHub Actions.

I am running the following workflow on https://digital.slub-dresden.de/werkansicht/dlf/87237/1/(with https://digital.slub-dresden.de/data/kitodo/adrefudio_20253082Z_1907/adrefudio_20253082Z_1907_mets.xml): 1. Cropping (`ocrd-anybaseocr-crop`) 2. Binarization (`ocrd-anybaseocr-binarize`) 3. Segmentation (`ocrd-anybaseocr-block-segmentation`) For most pages, the block segmentation finds only a few and very...
The [Wapiti](https://wapiti.limsi.fr/) CRF toolkit has a neat feature called *N-best Viterbi output* which returns the *n*-best label sequences for an input sequence. Is there a similar functionality in `crfsuite`? Thanks...
gender information from German Wiktionary. Not very smart but I do not know any Haskell. For my purposes, it works and may serve as a starting point for fixing https://github.com/LuminosoInsight/wikiparsec/issues/4
Each article title for nouns has information on the gender of the corresponding noun. It would be very helpful to have them extracted as well.
Many thanks for your wonderful tool! It would be a great addition to have the hyphenation patterns and the IPA representation in the set of extracted information.
Many thanks for your great efforts! I'd like to train a Tesseract model from your data via https://github.com/tesseract-ocr/tesstrain and contribute it to https://github.com/tesseract-ocr/tessdata_contrib. However, I am not sure whether this...
Currently, `dinglehopper` extracts text from PAGE XML files on the region level (https://github.com/qurator-spk/dinglehopper/blob/master/qurator/dinglehopper/ocr_files.py#L50). It would be wonderful if you could add a level-of-operation parameter to allow for extraction from line...