pirolen

Results 9 issues of pirolen

After converting a document from docx to FoLiA using Piereling (@proycon: I did not find a command line option for such a conversion), the FoLiA document contains (hidden/small) space characters,...

I got errors on two files upon submitting correction annotations, and those files would not open anymore, there is nginx gateway timeout signalled. I am attaching the docserver logs here...

bug

Hi, I would be happy to contribute data and insights that would help develop a tokenizer for Medieval/Premodern Slavic. Currently I am using tokconfig-rus on this data, and there'd be...

I wonder if this is the right way to loading the confusables file: ``` m = build_variant_model(alphabet_file, weightsconfig=ws1) m.read_confusablelist(confusables_file) ``` It would be brilliant to have an example about how...

question

Hi, I wonder if there is a way to have analiticcl generate variants that involve a whitespace: i.e. in case of runon errors, suggesting the split form. Suppose that 'holygrail'...

question

Hyphens are source of to some more problems in certain types of documents: e.g tokens at the end of a paragraph that end with a hyphen are not valid tokens,...

question

I wonder if there is a straightforward way to add from the tokenizer the sentences and their token content to build a new folia doc. It is not clear to...

question

May sound silly, but would it be possible to create a method that would allow retrieving sentences from the tokenizer without whitespace between punctuation marks (e.g. untokenized)? E.g. maybe providing...

enhancement

What is the best way to supply a list of known abbreviations to python-ucto and ucto in LaMachine?

question