pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

gettext empty result

Open bigmoney99 opened this issue 2 years ago • 2 comments

Hello, Iwant to extract this pdf, but the result is empty. https://www.mediafire.com/file/azb7yddqo2ry55j/123.pdf/file

this is my code

$parser = new \Smalot\PdfParser\Parser(); // Parse pdf file using Parser library 
$pdf = $parser->parseFile($file);
$metaData = $pdf->getDetails();
print_r($metaData); 
$pages  = $pdf->getPages();
foreach ($pages as $page) {
            $text = $page->getText();
            echo "<div>".$text."</div>";
}
echo $file;

the result just

Array
(
    [Producer] => cairo 1.17.4 (https://cairographics.org
    [Pages] => 1
)
<div></div>D:\web\D\public\pdf_po/123.pdf

bigmoney99 avatar Nov 14 '23 14:11 bigmoney99

Issue seems to appear both in 2.7.0 and 2.8.0rc. For some reason no text content sections are found and delivered to formatContent() to parse. Text is selectable from within a PDF reader, so there is text there. More research is needed.

GreyWyvern avatar Nov 15 '23 16:11 GreyWyvern

Hello, I have the same problem with this pdf file: https://www.ipgp.fr/wp-content/uploads/2024/05/OVSG20240508_RessTecto_Guadeloupe.pdf

My code: $parser = new \Smalot\PdfParser\Parser(); // Parse pdf file using Parser library $pdf = $parser->parseFile($file); $metaData = $pdf->getDetails(); print_r($metaData); $pdf->getPages()[0]->getText(); echo "<div>".$text."</div>";

The result: `Array ( [Producer] => cairo 1.17.4 (https://cairographics.org [Pages] => 1 )

`

ADS971 avatar May 10 '24 19:05 ADS971