the text “Hello China" is detected to 'it'
when l detect ”Hello China" print(langid.classify(”Hello China")) the result : ('it', -37.309250354766846) @Paczesiowa @pquentin @martinth @jnothman @saffsd
This can happen on short texts, try a longer one
This can happen on short texts, try a longer one
thanks
l try a sentence contain five words still detect wrong like this:"hello China you are great"
('it', -31.29085063934326)
when contain six word like this "hello China you are my sunshine" its right
('en', -49.038776874542236)
another like this "hello China hello China hello China " its wrong
('it', -27.979979038238525)
l would like to know how many words should l try at least in the sentence? @pquentin @martinth @jnothman
I am dealing with the same issue. In my case, inputting larger pieces of text is no problem, but I want to know what increase of text volume increases the reliability in which extent. Moreover, does it have to be a real text, or is a bunch of words from the language also fine? Lastly, I wonder what the returned negative coefficient says about the reliability of the translation. I couldn't find information about what this number actually means.
Many thanks in advance.
Try my fastlid: pip install fastlid
Fast and accurate, dependent on fasttext though (Windows systems without a C compiler can use fasttext*,whl available at https://www.lfd.uci.edu/~gohlke/pythonlibs/) .
fastlid also tries to imitate two of langid's functionalities.
Having the same issue.
The text Our fifth module explains some key calculus skills is detected as 'no' though it have 8 words.
In another example, the text (with 4 words) Discover some angle relationships is detectesd as 'sw' but when I changed the text to Discover some angle relationships between them (with 6 words) then it is detected as 'en' as expected..
So what is the minumum word we need to detect?
+1