dkpro-core icon indicating copy to clipboard operation
dkpro-core copied to clipboard

German OpenNLP chunker model

Open andreahorbach opened this issue 6 years ago • 6 comments

We trained a German chunker model for OpenNLP that we would like to contribute. The model was trained on the TIGER corpus using their annotations with a number of systematic modification to match the annotations to TreeTagger chunks. In a 10-fold cross-validation experiment as provided by the OpenNLP model trainer we reach an average F-Score of 96%. If such a model would be of interest for dkpro, could you guide us how to proceed in providing the model?

andreahorbach avatar Jan 09 '20 15:01 andreahorbach

The way is normally works is that the person/group creating the model puts it up somewhere on the web (their website, GitHub, some language resource repository, etc.). Then we have a set of scripts in DKPro Core which download such models, package them up as a JAR, and then deploy them to our Maven repository.

So the first step would be that you put the model up somewhere. Please be confident that you are legally allowed to share the model.

Then, you or we could extend the OpenNLP model packaging script to include your model:

https://github.com/dkpro/dkpro-core/blob/master/dkpro-core-opennlp-asl/src/scripts/build.xml

Then we'd use the script to build the model JAR and to upload it to our Maven repo.

Finally, the DKPro Core OpenNLP Maven module pom.xml file would be extended to include the new model.

reckart avatar Jan 14 '20 08:01 reckart

Thank you for explaining the process.

We uploaded the model at https://github.com/ltl-ude/opennlpChunkerGerman/blob/master/de-chunker-opennlp.bin

It would be great if you could take care of including it into the build.xml, but if not we can of course have a look at doing that ourselves.

mariebexte avatar Jan 17 '20 11:01 mariebexte

@aggarwalpiush would you like to try this?

There is some documentation on implementing/extending the model building scripts here.

The model script for OpenNLP models is here.

You couldn't deploy it to the UKP Maven server at the moment though. Either I'd need to do that or we need to give you proper permissions.

reckart avatar Jan 17 '20 15:01 reckart

@reckart Before extending the model script, I was trying to run existing OpenNLP asl build.xml, but I got build issue at line 675

I found that ixa pos model is not available at the url provided in the script. I think, we also need to fix this issue.

aggarwalpiush avatar Jan 17 '20 17:01 aggarwalpiush

@aggarwalpiush good idea :)

reckart avatar Jan 18 '20 11:01 reckart

the build.xml is fixed and PR-1459 is raised. If it looks good, kindly merge it and build the new models JARs at UKP Maven server.

As JARs are available at the server, I'll add them to OpenNLP pom.

aggarwalpiush avatar Mar 03 '20 00:03 aggarwalpiush