python-ucto issues

Accessing hyphenated tokens at the end of a paragraph

2

Hyphens are source of to some more problems in certain types of documents: e.g tokens at the end of a paragraph that end with a hyphen are not valid tokens,...

pirolen

question

Adding the tokenizer contents to a FoLiA doc

7

I wonder if there is a straightforward way to add from the tokenizer the sentences and their token content to build a new folia doc. It is not clear to...

pirolen

question

Expose --textclass equivalent in Python API

currently not accessible from Python.

proycon

enhancement

Question: possible to retrieve untokenized sentences?

1

May sound silly, but would it be possible to create a method that would allow retrieving sentences from the tokenizer without whitespace between punctuation marks (e.g. untokenized)? E.g. maybe providing...

pirolen

enhancement

Question: Abbreviations list

29

What is the best way to supply a list of known abbreviations to python-ucto and ucto in LaMachine?

pirolen

question

python-ucto
python-ucto copied to clipboard

Metadata

Accessing hyphenated tokens at the end of a paragraph

Adding the tokenizer contents to a FoLiA doc

Expose --textclass equivalent in Python API

Question: possible to retrieve untokenized sentences?

Question: Abbreviations list

← Metadata

Owner

Metadata

python-ucto python-ucto copied to clipboard

Metadata

Accessing hyphenated tokens at the end of a paragraph

Adding the tokenizer contents to a FoLiA doc

Expose --textclass equivalent in Python API

Question: possible to retrieve untokenized sentences?

Question: Abbreviations list

← Metadata

Owner

Metadata

python-ucto
python-ucto copied to clipboard