Checkboxes values not visible in flattened PDF
Describe the bug
convert_from_path() does not flatten tick boxes (and changes the appearance of radio buttons).
To Reproduce
from pdf2image import convert_from_path
images = convert_from_path("in.pdf")
im1 = images[0]
images.pop(0)
im1.save("out.pdf", "PDF", resolution=100.0, save_all=True, append_images=images)
Here is my test input pdf and the flattened output I generated:
Expected behavior The checkbox that was ticked in my input file is no longer ticked in the flattened output. The round radio buttons have turned into boxes in the flattened output.
It could be cause a particular font that is used by your pdf (most probably ZaDb https://fontsgeek.com/fonts/zapf-dingbats-regular responsible for things like radio buttons and checkboxes) is not embedded in the pdf itself.
If the font is not embedded, pdf2image most probably looks for fonts at some location in your system and fails to find it.
You can try out this command and see the warning it throws.
pdftoppm -r 200 -jpeg in.pdf out
For me it gave something liike this:
Syntax Error: Unknown font tag 'ZaDb'
Syntax Error: Unknown font tag 'ZaDb'
So now either you can add this font to your system (as suggested here) or you can run the following command to embed the font in the pdf itself and create an intermediate pdf which can then be converted to image using pdf2image:
gs -o intermediate.pdf -sDEVICE=pdfwrite -dEmbedAllFonts=true -dSubsetFonts=true -dNOPAUSE -dBATCH -f in.pdf
>>> from pdf2image import convert_from_path
>>> images = convert_from_path("intermediate.pdf")
>>> im1 = images[0]
>>> im1.save("out.pdf", "PDF", resolution=100.0, save_all=True, append_images=images)
>>> exit()
Hope this helps.
Thank you so much for your answer. This helps a lot! 👍