naps2 icon indicating copy to clipboard operation
naps2 copied to clipboard

Feature Request: Custom Tesseract Command / Options For OCR

Open EmperorArthur opened this issue 2 years ago • 0 comments

Reason For Feature

I recently rand into an issue where OCR on a high quality scan of my W2 was not working correctly.

In trying to debug the issue, I found the only "knobs" I could tune without re-compiling the whole project were the language and "Fast" or "Best". This makes it extremely difficult to figure out what is going on. Additional options, even if only through the config file would allow me to try things without having to install a compiler and re-compile over and over again.

Main Feature Details

Allow overriding the following parameters/strings in https://github.com/cyanfish/naps2/blob/master/NAPS2.Sdk/Ocr/TesseractOcrEngine.cs

  • startInfo.FileName
  • startInfo.Arguments
  • startInfo.CreateNoWindow
  • languageDataPath

Also, add something like logger.LogDebug(...) all of those items so debugging exactly what the system is doing in an easier manner.

Additional Feature(s)

This could be a separate request, but I am including it here for now.

Add an option to either show the OCR text in an extremely visible manner over the scanned image, or to output a PDF with only visible OCR text and no image.

Currently I have to highlight part of the PDF and copy/paste it to see if the OCR software worked correctly or not!

EmperorArthur avatar Oct 18 '23 06:10 EmperorArthur