Add language feature to Div type
Some time ago I had a discussion with @reckart about a project I am working on where we need paragraph- or even sentence-level language annotations in our document. Then the conclusion was that maybe it's a good idea to add language as a feature to Div type (de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Div) which is the super-type of the aforementioned types.
Are there any other things that need to be done with respect to this requirement?
- [ ] add feature
- [ ] add feature in UML diagrams in type system documentation
As far as I remember, Div does not have any capability of storing values, such as language. MetaDataStringField might be more suitable, as it allows you to store arbitrary key-value pairs, e.g. lang:en.
@carschno well, the idea is that we add the capability of storing language information to the div.
I consider the MetaDataStringField to be used for metadata affecting the whole document, not only sections of it.
A question is if components should be aware of divs with a language and how they should handle them. E.g. should a POS tagger use an English model on text in a "en" div and a German model on text in a "de" div? Or should we introduce some CAS multiplier that splits up a document into multiple CASes - one per div-with-language - and then we run separate POS taggers...?