OCRmyPDF icon indicating copy to clipboard operation
OCRmyPDF copied to clipboard

[BUG] ghostscript fails due to small resolution value

Open neurolabs opened this issue 2 years ago • 7 comments

Describe the bug When calling ocrmypdf 14.2.0 on the example file, ghostscript gets called with the resolution parameter set to -r1.209464x1.209464, which leads to an error Unrecoverable error: rangecheck in setscreen. If I call ghostscript with a higher resolution setting manually (e.g. 100x100), ghostscript succeeds.

To Reproduce

docker run --rm -i ocrmypdf:v14.2.0 -v1 -k - - <blank.pdf

Example file blank.pdf

Expected behavior ocrmypdf should not call ghostscript with resolution parameters that make ghostscript fail

System

  • OS: Linux
  • OCRmyPDF Version: 14.2.0
  • using https://hub.docker.com/layers/jbarlow83/ocrmypdf/v14.2.0/images/sha256-7ce119676031c5efb057c150551536a9c63d0468d63f44a496f1a863970af81b?context=explore

neurolabs avatar May 14 '23 10:05 neurolabs

Excuse me asking. Do you have any idea yet whether this should be fixed in the codebase or whether it's a wont-fix in your opinion?

Some more background: I discovered this issue while feeding a real world pdf to https://github.com/paperless-ngx/paperless-ngx , and from my point of view, tackling this issue in OCRmyPDF makes the most sense.

neurolabs avatar Jul 01 '23 09:07 neurolabs

It can and should be fixed in ocrmypdf, but I'm short on time.

This is a superficially easy fix. It's not hard to force a lower limit on resolution.

It's more difficult to find out why the resolution comes out low for that PDF, if our calculation of resolution is wrong, if the PDF is malformed, or if there are cases where resolution is legitimately low and keeping it low is the right decision.

You're welcome to take a stab at it.

jbarlow83 avatar Jul 03 '23 06:07 jbarlow83

Thanks for the clarification. If the moons align, I might poke at it, but I'm also short on time.

neurolabs avatar Jul 03 '23 16:07 neurolabs