naourass

Results 18 comments of naourass

@bsekachev Thanks for letting me know that it's a dev decision. I may take some time and inspect the code architecture further to see if I can find any reliable...

Has anyone found a workaround to fix this?

I'm running into this same issue. All my target text is in join format. Is it possible to isolate the letters when they're joined?

@pubpub-zz From my first analysis, I think that the concatenation flow should be changed to handle more cases. I'm also inspecting whether it would be possible to fix this using...

There's also a decoding issue for some characters. To focus on inspecting the concatenation order issue, I'm manually overriding them by adding a temporary `cmap_override ` argument to `extract_text()`: ```...

@pubpub-zz I have an update regarding this issue. I'm not a BiDi expert (yet), but after further inspection, here's my humble conclusion so far: - How to handle bidi concatenation...

@pubpub-zz @MartinThoma After more experimentation, it looks like it's much simpler to just drop the RTL dir checks, process everything as LTR to provide the "logical" version of the text...

Is there any workaround for the "RECITATION" issue?

@amitdo I tried that and it still doesn't work unfortunately. I tried with different crops / margins (tried both odd and even numbers) but it always fails to detect the...

You need to preprocess the image for the ocr to work properly, especially binarizing/thresholding the image: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#binarisation