PyMuPDF
PyMuPDF copied to clipboard
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
**Is your feature request related to a problem? Please describe.** When **find_tables**, watermark text may be considered into table cells **Describe the solution you'd like** 1. For greater flexibility in...
### Description of the bug I noticed that memory is constantly increasing in my application and traced this to the `insert_htmlbox` call, if I remove this, everything is fine. ###...
### Description of the bug I have a function that extract the clustered drawings from a PDF. This function takes much longer time after after upgraded to 1.26.0 (and 1.26.3)...
linkDest in following format '#page=1&view=Fit' fails for subject code
This commit attempts to clean out old cruft, to make room for new cruft. This is clearly not the most pressing issue in the world, however I think it's far...
### Description of the bug I understand that it's expected for `clean_contents()` to no longer generate line breaks. However, `scrub` calls `clean_contents`, and then passes the `cont.splitlines()` (which is a...
When using the setup.py buildtest command, the git repository gets a lot of untracked files. Especially the whole mupdf soruce code as well as a lot of test artifacts. To...
Hello, all this does is add a `__slots__` entry to a few classes. this small change makes an outsized impact, reducing the size of instances dramatically, and leads the way...
We previously refreshed a Page in memory by simulating navigating away to another page and returning. This has been necessary after certain updates or inserts of annotations and links. This...