MedKhem issues

Results 12 issues of


                                            MedKhem

Sub/superscript are displayed as plain text characters in the TEI output

First re-flexion, identify piece of text as sub/superscript based on position, fonts, etc.

enhancement

Well formation error in the final TEI output

An internal validation scheme should be probably added

invalid

Enable the extraction of Lemmas and POS from full parsing

When the morphological is processed, extracting the list of lemmas and pos should be possible

Enable the selection of models within the full dictionary parsing level

the existing "parse full dictionary" service doesn't allow the user to get the parsing results of specific models like form or sense

enhancement

Enable the encoding of morpho and semantic encyclopaedic content

Extend the TEI model for lexical entries

more labels could be used to encode a lexical entry other than: \, \, \, \ and \.

Generation of fresh training data: Line breaks are omitted when extracted for some pdf files

Enable 2 data training data generation modes per model and necessary files for annotation

For each model, 2 commands should be available: one for raw text creation (to be annotated from scratch) and one with pre-annotated text (which is going to be refined in...

enhancement

Check the rendering of the final TEI output

After adding and testing new models, their output should follow the same logic as previous models (case when the entry is cut on 2 pages)

enhancement

Etymology extension

Implement components for parsing and segmenting etymological information in etymology section detected in a lexical entry

enhancement