pirolen
pirolen
After converting a document from docx to FoLiA using Piereling (@proycon: I did not find a command line option for such a conversion), the FoLiA document contains (hidden/small) space characters,...
I got errors on two files upon submitting correction annotations, and those files would not open anymore, there is nginx gateway timeout signalled. I am attaching the docserver logs here...
Hi, I would be happy to contribute data and insights that would help develop a tokenizer for Medieval/Premodern Slavic. Currently I am using tokconfig-rus on this data, and there'd be...
I wonder if this is the right way to loading the confusables file: ``` m = build_variant_model(alphabet_file, weightsconfig=ws1) m.read_confusablelist(confusables_file) ``` It would be brilliant to have an example about how...
Hi, I wonder if there is a way to have analiticcl generate variants that involve a whitespace: i.e. in case of runon errors, suggesting the split form. Suppose that 'holygrail'...
Hyphens are source of to some more problems in certain types of documents: e.g tokens at the end of a paragraph that end with a hyphen are not valid tokens,...
I wonder if there is a straightforward way to add from the tokenizer the sentences and their token content to build a new folia doc. It is not clear to...
May sound silly, but would it be possible to create a method that would allow retrieving sentences from the tokenizer without whitespace between punctuation marks (e.g. untokenized)? E.g. maybe providing...
What is the best way to supply a list of known abbreviations to python-ucto and ucto in LaMachine?