Ko van der Sloot

Results 42 issues of Ko van der Sloot

In the code (cooccur.c and glove.c) there are some build-in tests for repeated entries in the vocabulary. Also long entries are detected. This however leads to very strange and questionable...

Some options, like '-f', 'textclass' and '-x id' are obsolete since many years. I would like to remove them in the next release, by giving a fatal error with information....

This might be handy to accommodate for repetitive patterns. Like ['`’‘´]

enhancement

given the attached file issue77.xml.txt ucto will create invalid folia: UIT.xml.text The command was: ``ucto --passthru issue77.xml UIT.xml`` ``` >foliavalidator UIT.xml VALIDATION ERROR on full parse by library (stage 2/3),...

bug

it 'might' be a good idea to add a rule for ROMAN numbers. A rule along these line: ``ROMAN-NUMBER=(^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$)`` If we add this, do we also want lower-cased variants? We...

enhancement
question

At the moment we have a lot of tests for dutch and some small tests for English, French, German and Spanish. We lack tests for Portuguese, Swedish, Frysk, Italian, Russian,...

At the moment the include mechanism is a bit messy. it is context dependent and uses a lot of implicit knowledge. It would be convenient to be able to include...

enhancement

recently an --add-tokens option was introduced to Ucto to add extra 'TOKENS' to the configuration. We might consider extending this, so a user could add extra, non-default rules/items to the...

enhancement

The --detectlanguages option is confusing: On plain text it means: Detect the language, tokenize according to that language and assign it to the FoLiA output. On FoLiA input it means:...

enhancement

Is this FoLiA valid? Both folialint and foliavalidator reject it (on different grounds) ```xml Dit is test. zin 2. Dit is test. zin 3. Dit is test. Dit is test....

bug
question