PyMuPDF icon indicating copy to clipboard operation
PyMuPDF copied to clipboard

The image generated by get_pixmap() is abnormal, but the text result is correct

Open 1339503169 opened this issue 1 year ago • 1 comments

Description of the bug

here is original pdf 1832786.pdf image generated by get_pixmap() 1832786 pdf_0 what is looks like in wps image

I opened this file in WPS and found it to be OK, and the text extraction was also correct. However, the image generated by get_pixmap() is very strange, and the Chinese text seems to be garbled

How to reproduce the bug

import fitz document = fitz.open('path/to/original pdf') page = document.load_page(0) page.get_pixmap()

PyMuPDF version

1.24.5

Operating system

Windows

Python version

3.8

1339503169 avatar Sep 09 '24 10:09 1339503169

This a bug in the base library, MuPDF. I have entered a bug report there, here is the link: https://bugs.ghostscript.com/show_bug.cgi?id=708019.

JorjMcKie avatar Sep 09 '24 11:09 JorjMcKie

The upstream bug has been closed as resolved/invalid, so perhaps this issue can be closed as invalid after having added an explantation?

sebras avatar Jan 24 '25 15:01 sebras

The PDF contains no embedded font and has no valid / reliable information for guessing what might be a suitable visual appearance. This also is reflected by behaviors of other PDF viewers: Some are guessing more successfully than others.

In summary we see no handle to behave differently as we currently do.

JorjMcKie avatar Feb 25 '25 10:02 JorjMcKie