PyMuPDF
PyMuPDF copied to clipboard
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
### Description of the bug When using page.get_pixmap() method, the program simply exits without any prompts(Both in Windows and Ubuntu) and cannot catch the exception. ### How to reproduce the...
**Problem** The type annotation of `Document.__getitem__` is wrong: ```python def __getitem__(self, i: int =0): if isinstance(i, slice): ``` The type annotation of `i` requires it to be an `int`, but...
### Description of the bug In some cases PyMuPDF is adding newline characters in the middle of words which do no exist if you simply copy/paste the text from the...
### Description of the bug Document.select() is not working in some particular kind of pdf files. I want to extract text from pdf files. If pdf has >30 pages then...
The issue arises because some PDFs return `/XYZ` coordinates in the format `/XYZ x y` instead of `/XYZ x y z`. This discrepancy causes the code to fail when attempting...
### Description of the bug What is happening is that when I read from the PDF, I use the rectangle information to collect color data. Recently, however, I encountered an...
…ixmap. Pixmap.color_count(): don't raise exception if JM_color_count() returns empty dict. _read_samples(): return empty list if pixmap has no samples - avoids segv from fz_samples_get(). Addresses #3848.
### Description of the bug The module can only extract numeric or English content and does not support Chinese. ### How to reproduce the bug Code Sample ``` import pymupdf.pro...
### Description of the bug [font.valid_codepoints()](https://pymupdf.readthedocs.io/en/latest/font.html#Font.valid_codepoints) has stopped working correctly on the latest version. ### How to reproduce the bug #### code + sample pdf [font_valid_codepoints.zip](https://github.com/user-attachments/files/17328819/font_valid_codepoints.zip) #### latest version -...
_delXmlMetadata no longer exists and appears to have been replaced by del_xml_metadata.