langdata
langdata copied to clipboard
Source training data for Tesseract for lots of languages
https://code.google.com/p/tesseract-ocr/issues/detail?id=1392 What steps will reproduce the problem? 1. Unpack vie.traineddata downloaded from Tesseract repository 2. Run dawg2wordlist on vie.freq-dawg & vie.word-dawg to recover original lists 3. Examine the content What...
There's already a trained data file for the Latin dialect of the Kurdish language. Sorani dialect is the second most used dialect of the language and it'd be amazing to...
Here, i'm going to raise some issues related to Tesseract's Hebrew support. Dear participants who have interest in Arabic support, I suggest to raise Arabic issues in a separate 'issue',...
When running unicharset_extractor on the Spanish langdata, it warns that capital "Ñ", capital "É" and "«" are absent from the training text (while their counterparts, "ñ", "é" and "»", are...
updated around 60k words from dictionary.
Added a few appropriate contexts after reviewing the comment given by Shreesrii.
This is the correct list of numbers found in any native variety of Odia Texts/images.
Hello, I would like to help. I've already cloned all repository. How do I start?