spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Broken import for Icelandic language data

Open elisno opened this issue 4 years ago • 4 comments

How to reproduce the behaviour

It looks like importing language data for Icelandic is broken. E.g. to get stop words:

# This works
import spacy.lang.en.stop_words
spacy.lang.en.stop_words.STOP_WORDS

# Syntax error in import statement
import spacy.lang.is.stop_words
spacy.lang.is.stop_words.STOP_WORDS

Error:

>>> import spacy.lang.is.stop_words
  File "<stdin>", line 1
    import spacy.lang.is.stop_words
                      ^
SyntaxError: invalid syntax

I have yet to test this on Spacy v3.0.

Your Environment

  • spaCy version: 2.3.5
  • Platform: Linux-5.8.0-53-generic-x86_64-with-glibc2.29
  • Python version: 3.8.5

elisno avatar May 27 '21 10:05 elisno

Could this be resolved by referring to the language data directory with the three-letter country code?

spacy/lang/is -> spacy/lang/isl

elisno avatar May 27 '21 11:05 elisno

Thanks for the report! We'll have to find a workaround, indeed.

I'm a little surprised nobody's run into this before!

svlandeg avatar May 27 '21 21:05 svlandeg

Another workaround for this case is to use importlib:

import importlib
lang_is = importlib.import_module("spacy.lang.is")
lang_is.stop_words.STOP_WORDS

elisno avatar May 31 '21 11:05 elisno

In my project, I need to fetch stop words of all languages provided by spaCy, so I have to use the importlib way with f-string and did not run into this issue. Using three-letter code for only the Icelandic language (which has a two-letter ISO 639-1 code) would be inconsistent, or spaCy could use three-letter codes for all languages.

BLKSerene avatar Jul 08 '21 10:07 BLKSerene