guess-language
guess-language copied to clipboard
Exception for Unicode chars > 0xFFFF
What steps will reproduce the problem?
Unicode symbols from extended charset (ord(c) > 0xffff) cause exception.
Traceback (most recent call last):
File "describe-channels.py", line 20, in <module>
lang = guess_language.guessLanguage(" ".join(row.get('text', [])))
File "/usr/local/lib/python2.6/dist-packages/guess_language/guess_language.py", line 300, in guessLanguage
return _identify(text, find_runs(text))
File "/usr/local/lib/python2.6/dist-packages/guess_language/guess_language.py", line 352, in find_runs
block = unicodeBlock(c)
File "/usr/local/lib/python2.6/dist-packages/guess_language/blocks.py", line 64, in unicodeBlock
return _names[ix]
IndexError: list index out of range
Original issue reported on code.google.com by [email protected] on 17 Sep 2012 at 8:02
Hi,
As mentionned on the main page, this package is no longer maintained. Please
report any issues to my forked version:
https://bitbucket.org/spirit/guess_language
Although my version is a Python 3 port, I try to also support Python 2 if it's
not too hard.
That being said, I believe my version is not affected by this issue.
Original comment by [email protected] on 25 Sep 2012 at 9:32