PyMuPDF
PyMuPDF copied to clipboard
PyMuPDF Pro cannot extract Chinese content from DOC and DOCX files
Description of the bug
The module can only extract numeric or English content and does not support Chinese.
How to reproduce the bug
Code Sample
import pymupdf.pro
pymupdf.pro.unlock()
doc = pymupdf.open("/Users/maxyou/Downloads/demo.docx")
for page in doc:
print(page.get_text())
break
Output
PyMuPDFPro: Restricted Mode. Please visit https://pymupdf.io/try-pro to request your trial key.
hello,,123456789
DOCX Content
hello,中文示例,123456789
DOCX File demo.docx
PyMuPDF version
1.24.11
Operating system
MacOS
Python version
3.12