The image generated by get_pixmap() is abnormal, but the text result is correct
Description of the bug
here is original pdf
1832786.pdf
image generated by get_pixmap()
what is looks like in wps
I opened this file in WPS and found it to be OK, and the text extraction was also correct. However, the image generated by get_pixmap() is very strange, and the Chinese text seems to be garbled
How to reproduce the bug
import fitz document = fitz.open('path/to/original pdf') page = document.load_page(0) page.get_pixmap()
PyMuPDF version
1.24.5
Operating system
Windows
Python version
3.8
This a bug in the base library, MuPDF. I have entered a bug report there, here is the link: https://bugs.ghostscript.com/show_bug.cgi?id=708019.
The upstream bug has been closed as resolved/invalid, so perhaps this issue can be closed as invalid after having added an explantation?
The PDF contains no embedded font and has no valid / reliable information for guessing what might be a suitable visual appearance. This also is reflected by behaviors of other PDF viewers: Some are guessing more successfully than others.
In summary we see no handle to behave differently as we currently do.