cistern icon indicating copy to clipboard operation
cistern copied to clipboard

Open-source tools for morphological tagging, segmentation and stemming.

Results 8 cistern issues
Sort by recently updated
recently updated
newest added

Make it clear that weird stuff happens if you have '.' or '-' in wordforms.

The Marmot documentation mentions feature-templates for training: ``` Comma separated list, activates individual templates. Default value: "form,rare,affix,context,sig,bigrams" ``` Is there any documentation for the meaning of these templates, what are...

``` $ java -Xmx5G -cp cistern/marmot/marmot-2019-02-21.jar marmot.morph.cmd.Trainer -very-verbose true -conllu-format true -train-file form-index=1,tag-index=2,morph-index=5,fi_tdt-ud-train.conllu -tag-morph false -model-file fi.marmot Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0 at marmot.util.LineIterator.next(LineIterator.java:77) at marmot.morph.io.SentenceReader$1.next(SentenceReader.java:56) at marmot.morph.io.SentenceReader$1.next(SentenceReader.java:32) at...

Hi, I'm using Marmot to train a morphological tagging model on the UD Hebrew treebank (in UTF-8 Hebrew, not transliterated). Training and tagging seem to work fine, but in the...

This patch allows for using the command line option `--conllu-format "true"` to output in CoNLL-U format[1], preserving comments, segments and empty nodes. It also allows for training data to be...

It would be really cool if Marmot had support for CoNLL-U input so that UD treebanks could be used directly as training data. Some things that it would need to...

Chipmunk currently dies when reading empty lines in input. This is of course caused by problems in other parts of pipeline (I am currently experimenting with it as part of...