Chocolately package for Windows
I would appreciate if someone were able to contribute a Chocolately package for ocrmypdf for Windows.
I installed from Chocolatey in March following the steps on the readthedocs site and it went pretty well. I guess you're looking for someone to update the packages for the latest build? Unfortunately, IDK how to do that.
I will ask: the build I got in March (9.6.1) for Windoze only has "eng" and "osd" languages...not the 100 languages included by default in the docs. So I'm wondering how to get those other 98 languages installed! :)
I'm looking for someone to make choco install ocrmypdf do all of the steps, instead of the longer set of instructions on the site.
You can manually download more languages from https://github.com/tesseract-ocr/tessdata_best. Just find the folder where eng.traineddata lives and copy the others you want into there. There is also tessdata_fast if you want more speed less accuracy.
Ah, understood. I would appreciate someone doing that too!!!
(and maybe figure out how to package unpaper while they're at it :) )
Thanks for the tip about the raw language files. choco was not helping me!
FYI: gs throws an error if I try and use ocrmypdf with -l eng+jpn (after copying the files to the tessdata folder). Do you want any logs?
FYI: choco instructions appear to have installed tesseract 5.0 (alpha). Apparently the traineddata files are different (my testing shows they work on my 4.0 tesseract with ocrmypdf, but not 5.0).
If/when additional work is done for choco and direct windows installs, this should be standardized. Not sure where there is any language files for the 5.0 version yet.
Maybe they are in a branch?
On Thu., May 28, 2020, 06:39 ajweber, [email protected] wrote:
FYI: choco instructions appear to have installed tesseract 5.0 (alpha). Apparently the traineddata files are different (my testing shows they work on my 4.0 tesseract with ocrmypdf, but not 5.0).
If/when additional work is done for choco and direct windows installs, this should be standardized. Not sure where there is any language files for the 5.0 version yet.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jbarlow83/OCRmyPDF/issues/539#issuecomment-635356456, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN5YMYNMEO5EUJ2H44RLK3RTZSSVANCNFSM4MMGRHFQ .