Christian Wartena

Results 18 comments of Christian Wartena

That is strange. The word is not in the training data (I guess), but it shouldn't emove the last letter. I will have a look and try to fox it...

_Buche_ is a problem, because it could indeed be the dative of Buch. _Zoom_ is an intersting one. The word _Zoom_ is not in the training data, but _Zoo_ is....

In the latest version the analysis of 'Zoom' is still wrong, but at least it gets a correct lemma. The problem with 'kannst' is solved. 'Buche' still is a problem....

Thanks for testing. I will think about a solution, but this doesn't seem to be easy. There are no rules in the program but everything is learned from the training...

O, that is an interesting one! Actually, I have no idea how to treat unknown loanwords, or how to recognize them in the first place. However, the algorithm should be...

Thanks! Always good to have some cases to work on ;-)

Thanks! This last one could be solved by annotating adj-noun compounds appropriately in the training data.

Sorry, I replied by mail instead of using Github. Indeed, HanTa is trained mainly on the Tiger corpus and thus uses the Tiger annotation Scheme: https://www.ims.uni-stuttgart.de/documents/ressourcen/korpora/tiger-corpus/annotation/tiger_scheme-morph.pdf (esp. pp 26/27) A...

In the latest version I have added two methods: - list_postags() - list_mtags() The first one gives a list of all POS-tags, the second a list of all tags used...

Yes, I know. This is not so easy to solve unless I would write 100 Heurisic Rules, since there are MANY POSSIBILITIES how capitalization can be used. If you know...