Ali Sufian
Ali Sufian
extractText() cpu/memory utilization is massive for the following 1 page 3 MB file. The extraction doesn't complete and the process has to be killed. http://www.dora.state.co.us/pls/efi/efi_p2_v2_demo.show_document?p_dms_document_id=105933&p_session_id=
The following script originally hanged, but with PyPDF2==2.4.2 we get `PdfReadError: EOF marker not found`. ## MCVE: PDF + Code [This file](https://www.puc.nh.gov/Regulatory/CASEFILE/2001/01-006%20THROUGH%20MARCH%202010/01-006%202009-04-30%20FRP%20NON%20CONFIDENTIAL%20PAP%20FILING.PDF) is 298MB with 21 pages. ```python from PyPDF2...