dkpro-cassis icon indicating copy to clipboard operation
dkpro-cassis copied to clipboard

Add support for DKPro type system and annotations

Open jcklie opened this issue 7 years ago • 14 comments

it would be nice to be able to initialize a CAS with a certain type system, e.g.

from somewhere import DKProCoreTypeSystem
from cassis import Cas

cas = Cas(DKProCoreTypeSystem())

In order to verify that the DKPro typeystem is supported by cassis, the typesystem xml and an example CAS should be added to the tests.

jcklie avatar Oct 09 '18 17:10 jcklie

Here is a single type system file for a start which includes most DKPro Core types (not all, but all important ones - i.e. those from the API modules that are part of DKPro Core 1.10.0):

aggregated-ts.xml.zip

reckart avatar Oct 09 '18 23:10 reckart

Right now, it fails because uima.tcas.DocumentAnnotation is missing. Is it also a predefined annotation like uima.tcas.DocumentAnnotation and uima.tcas.DocumentAnnotation?

jcklie avatar Oct 10 '18 07:10 jcklie

@Rentier I have tried extracting most of the "internal" type system of UIMA. Mind this is only an indicative type system XML file to give you an easier overview of how UIMA initializes the type system of an empty CAS. The file itself can actually not be loaded with Java UIMA - UIMA would complain during loading.

uima-internal-ts.xml.zip

reckart avatar Oct 10 '18 08:10 reckart

Note that the TS does not include any built-in types which do not have a parent type. That's specifically the uima.cas.TOP type.

reckart avatar Oct 10 '18 09:10 reckart

I found that when I serialize an UIMA Cas then it always creates a uima.tcas.DocumentAnnotation in both cas and type system. Therefore I do not add it to the predefined types.

jcklie avatar Oct 15 '18 15:10 jcklie

@Rentier what about CASes natively created in Cassis and then serialized?

reckart avatar Oct 15 '18 16:10 reckart

I do not serialize predefined types and features.

jcklie avatar Oct 15 '18 17:10 jcklie

I added the type system support, I wait for @reckart to give me some CAS XMI that contains one of each DKPro annotation (or as many different types as possible).

jcklie avatar Oct 16 '18 17:10 jcklie

Is there anywhere an example of how to use the DKPro type system?

zesch avatar Nov 20 '19 19:11 zesch

It is in Pull Request #86 and not merged yet.

jcklie avatar Nov 20 '19 20:11 jcklie

Just played around with this module (v0.2.7) and tried to use load_dkpro_core_typesystem() but it seems that "dkpro-core-types.xml" is absent from the PyPI package (see https://pypi.org/project/dkpro-cassis/#files), causing the function to tail. Adding in the XML file from the repo fixed it but you might want to look at this to make the released version work.

Thanks for this neat module!

ramonziai avatar Jan 27 '20 18:01 ramonziai

Also running into the missing "dkpro-core-types.xml" problem.

triclops200 avatar Apr 16 '20 15:04 triclops200

@triclops200 Which cassis version are you using? I thought that I fixed it.

jcklie avatar Apr 16 '20 17:04 jcklie

You're right, was behind a version or two in my pyenv, apologies.

triclops200 avatar Apr 16 '20 19:04 triclops200