docspell Please consider adding Mandarin language

I also have a lot of Documents written by Mandarin. Can you add this too?

Apr 07 '23 05:04 iszhi

I'm not against it at all, but it is for me not really doable, since I have zero knowledge of Mandarin. The NLP processors don't support it afaik, but tesseract (the tool doing the OCR) has support for chinese traditional and simplified, don't know if that would help?

For date recognition I would need a PR or at the very least all the info from here

Apr 07 '23 08:04 eikek

I'm not against it at all, but it is for me not really doable, since I have zero knowledge of Mandarin. The NLP processors don't support it afaik, but tesseract (the tool doing the OCR) has support for chinese traditional and simplified, don't know if that would help? @eikek Since NLP don't support Mandarin, can you add it via tesseract? (PS. I don't know either NLP and tesseract exactly.)

Apr 07 '23 10:04 iszhi

I think tesseract has support for simplified and traditional chinese - which one is better? It is possible to add it to the docker image and add a language option to the ui.

Apr 10 '23 18:04 eikek

In China, simplified Chinese is used in mainland China, and traditional Chinese is used in Taiwan and Hong Kong. Simplified Chinese means more user base. But if possible, I recommend installing two languages.

Apr 10 '23 18:04 iszhi

I'm not against it at all, but it is for me not really doable, since I have zero knowledge of Mandarin. The NLP processors don't support it afaik, but tesseract (the tool doing the OCR) has support for chinese traditional and simplified, don't know if that would help?

For date recognition I would need a PR or at the very least all the info from here

Stanford CoreNLP support (mainland) Chinese.

Stanford CoreNLP [backup download page] An integrated suite of natural language processing tools for English, Spanish, and (mainland) Chinese in Java, including tokenization, part-of-speech tagging, named entity recognition, parsing, and coreference

Mar 12 '24 14:03 kxu1988