langdata icon indicating copy to clipboard operation
langdata copied to clipboard

Language Request: Kurdish Sorani (Central Kurdish)

Open makwanbarzan opened this issue 3 years ago • 1 comments

There's already a trained data file for the Latin dialect of the Kurdish language. Sorani dialect is the second most used dialect of the language and it'd be amazing to have a trained data file in Tesseract.

The script is Persian-like, except having a few different letters like ژ، گ، ڤ، چ، ۆ. So it shouldn't take so much effort to develop.

Thank you and I'm looking forward to getting a response.

makwanbarzan avatar Apr 29 '22 22:04 makwanbarzan

All those characters are included in the script/Arabic model. Maybe that already works for Sorani text?

stweil avatar Apr 30 '22 06:04 stweil