theraysmith comments

Results 38 comments of


                                            theraysmith

Would like to help for Burmese/Myanmar language training?

Please take a look at this reference: http://www.unicode.org/versions/Unicode9.0.0/ch16.pdf Table 16-3. The text says "Characters occur in the relative order shown in Table 16-3" which I do not believe to be...

Would like to help for Burmese/Myanmar language training?

Please see code at: https://github.com/tesseract-ocr/tesseract/blob/master/training/validate_myanmar.cpp On Thu, Jul 13, 2017 at 10:21 PM, Shreeshrii wrote: > Please see tesseract-ocr/tesseract#995 (comment) > > > For instance, there is a big table...

Add Filipino lang

My training text corpus does not distinguish between fil and tgl, while they show up in ISO-639-2T as distinct. For some reason that I can't remember now, the language code...

Add Half-width Katakana for Japanese

ff00-ffef are in the forbidden_characters list for jpn. See langdata/jpn/forbidden_characters. This means they are not present in any of the Google-trained models. I don't remember how/who recommended that they should...

MICR fonts

It could be used as fine tuning training and should work. On Fri, Mar 31, 2017 at 11:34 PM, Shreeshrii wrote: > https://github.com/tesseract-ocr/tesseract/wiki/AddOns lists > Tesseract-MICR-OCR: > for 3.0x >...

Updated langdata

Point taken. It needs updating. I was going to push until I discovered a bug with the RTL word lists. Then I also need to integrate this issues list, that...

Updated langdata

Hmm. Sorry. I thought I had done this in September. The Google repo is up-to-date apart from the redundant files that need to be deleted. I'll work with Jeff to...

Updated langdata

On Wed, Mar 21, 2018 at 1:28 AM Shreeshrii wrote: > @theraysmith > > 1. > > Since training depends on the fonts used, I suggest loading a file >...

German Fraktur

I found a problem with the synthetic training pipeline. The fraktur fonts were only about 1% of the training data, even for the frk language. This will be fixed in...

German Fraktur

After a *lot* of work, and a very long delay, the new training is almost ready to go. Just waiting for rendering to finish... Fixes in this round: Utilizes a...