Oneiricer
Oneiricer
Hi jeroen, Sure, here is the page/tiff file that it failed over. I suspect when you run the OCR_Data function it will work - it seems to fail randomly. [ag2018_5591_21.zip](https://github.com/ropensci/tesseract/files/2743702/ag2018_5591_21.zip)
Hi jeroen, Sorry for taking so long to get back to you. I use ocr_data instead of OCR, maybe that might produce a different result? This is what happens when...
Hi Jeroen, Here's all my code and all the PDF's. Thankfully there are no privacy concerns from my company around sharing these PDFs - they are publicly available already. If...
Hi Jeroen, I tried to re-run the same script using the same PDF files on my home beefier PC and got the same issue. I hope you can reproduce the...
I don't think it is that simple. I have 30 PDF documents - each time i run it, the error comes up for a different PDF, on a different page....
after working on this a bit more, i've been a little bit more successful. i included a garbage collection gc() into my loop and it ran significantly longer - from...
Thank you Jeroen. i just saw you're maintainer/author for a few other packages. Thanks a lot for you work, I definitely appreciate it. Will the binaries be available on CRAN...
Hi Jeroen, i've updated pdftools but the problem still persists. Again, runs for about an hour or two before throwing up that error. I also note that the system processing...
Just wondering, is there a way to use ocr and filter out the words that have less than 90 confidence?
Hi @JoelSPendery , Thanks for the excellent tip. I will look into using a batch file, havn't had any experience doing this. Will be a good learning exercise. I'm aware...