pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

gettext exracts partial text from pdf, only heading from each page

Open AntiHate opened this issue 1 year ago • 1 comments

  • PHP Version: 8.3
  • PDFParser Version: 2.11

Description:

Trying to get full text from the PDF, using gettext() extract only the few lines from each page but if use getObjects() I can get text all the text but then content order is random.

PDF input

sample.pdf

Expected output & actual output

actual output

CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 2 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 3 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 4 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 5 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 6 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 7 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 8 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 9 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 10 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 11 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 12 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 13 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 14 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 15 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 16 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 17 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 18 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 19 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 20 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 21 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 22 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 23 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 24 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 25 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 26 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 27 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 28 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 29 OF 29 PAGES SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED)

Code

Getting only partial text using gettext echo $pdf->getText();

Shows all the text but in random order

$objects = $pdf->getObjects();
foreach ($objects as $key => $object) {
        echo $object->getText();
}

AntiHate avatar Dec 10 '24 02:12 AntiHate

Hello I have the same issue with the following file. I have entered "MYSTRING123" in one of the pdf form fields. sample with mystring123 in one form field.pdf

$pdf->getText() does not show this string. But a loop on getObjects shows it.

PHP Version: 8.3 PDFParser Version: 2.11

orandev avatar Dec 10 '24 23:12 orandev