errant icon indicating copy to clipboard operation
errant copied to clipboard

Alpha support for German

Open cainesap opened this issue 1 year ago • 3 comments

Hi Chris!

  • add "lang" = language to argparse, with "en" the default.
  • Enables multilingual use of ERRANT.
  • Prompted by "MultiGEC" shared task for NLP4CALL 2025 ; will add more language options if you approve the idea

best, Andrew

cainesap avatar Sep 27 '24 09:09 cainesap

Hey Andrew!

The above won't actually work because errant.load officially only accepts en as a valid language in the __init__.py file. Additionally, the lang parameter also controls where the merger and classifier are loaded, so you'd need an errant.de directory that contains something similar to the errant.en directory.

You can fudge it and use the english merger/classifier with German spacy, and most of errant should still work from coarse POS tags and lemmas etc., but certainly other aspects like the rules for spelling errors and fine-grained verb errors (tense/sva/form) are unlikely to work well.

chrisjbryant avatar Sep 28 '24 10:09 chrisjbryant

I just updated the __init__.py file so it should work as you intended now!

I don't want to make it an official addition to errant however, because the German support is untested, but it should certainly suffice for MultiGEC as a custom extension!

Let me know how well it works for German!

chrisjbryant avatar Sep 28 '24 10:09 chrisjbryant

awesome, thanks Chris, will do!

cainesap avatar Sep 30 '24 09:09 cainesap