PyMuPDF Pro cannot extract Chinese content from DOC and DOCX files

Open maxyou2090 opened this issue 1 year ago • 0 comments

Description of the bug

The module can only extract numeric or English content and does not support Chinese.

How to reproduce the bug

Code Sample

import pymupdf.pro

pymupdf.pro.unlock()
doc = pymupdf.open("/Users/maxyou/Downloads/demo.docx")
for page in doc:
    print(page.get_text())
    break

Output

PyMuPDFPro: Restricted Mode. Please visit https://pymupdf.io/try-pro to request your trial key.
hello,,123456789

DOCX Content

hello,中文示例,123456789

DOCX File demo.docx

PyMuPDF version

1.24.11

Operating system

MacOS

Python version

3.12

Oct 22 '24 03:10 maxyou2090