pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

getting x and y coordinates of pages

Open NoxxieNl opened this issue 7 years ago • 6 comments

Hiya,

It is not exactly an issue but I am trying to get the x and y coordinates of specific text on the page. The extracting of the text works great! however I need to filter te text for some specific text and get the coordinates, is this possible yet (maybe with a little hack?)

NoxxieNl avatar Oct 27 '18 15:10 NoxxieNl

Currently using a loop to get the pages and get specific pages for text search. changed Page.php

line 183 to: public function getText(Page $page = null, $searchText = null, $returnAxes = false)

and line 221 to: return $contents->getText($this, $searchText, $returnAxes);

and PDFObject.php file,

line 252 to: public function getText(Page $page = null, $searchText = null, $returnAxes = false)

and added after line 337:

                            if (strpos($sub_text, $searchText) !== false) {
                                $this->searchFound = true;

                                if ($returnAxes) {
                                    return $current_position_tm;
                                    break;
                                } else {
                                    return true;
                                }
                            }
                        }

last added at replaced arround line 469 (return line): if (!$this->searchFound && !is_null($searchText)) { return false; } else { return $text . ' '; }

and: added $this->searchFound as variable of class, currently my tests show positive results 👍

Adds functionality like: $coordinates = $page->getText(null, 'TEXT', false);

returns true or false if the text is present on current page

$coordinates = $page->getText(null, 'TEXT', true); returns the the coordinates in an array if present (if not returns false)

Total test script I am using:

       
        $location = 'xxx';
        $parser = new \Smalot\PdfParser\Parser();
        $pdf = $parser->parseFile($location);

        $pages = $pdf->getPages();
        $i = 1;
        foreach ($pages as $page)
        {
            if ($page->getText(null, 'TEXT')) {
                $coordinates = $page->getText(null, 'TEXT', true);

                echo 'Found it (in pspoints...) on x coordinates: ' . $coordinates['x'] . ' and y coordinates ' . $coordinates['y'] . ' on page ' . $i;
            }
            
            $i++;
        }

NoxxieNl avatar Oct 27 '18 16:10 NoxxieNl

Currently using a loop to get the pages and get specific pages for text search. changed Page.php

line 183 to: public function getText(Page $page = null, $searchText = null, $returnAxes = false)

and line 221 to: return $contents->getText($this, $searchText, $returnAxes);

and PDFObject.php file,

line 252 to: public function getText(Page $page = null, $searchText = null, $returnAxes = false)

and added after line 337:

                            if (strpos($sub_text, $searchText) !== false) {
                                $this->searchFound = true;

                                if ($returnAxes) {
                                    return $current_position_tm;
                                    break;
                                } else {
                                    return true;
                                }
                            }
                        }

last added at replaced arround line 469 (return line): if (!$this->searchFound && !is_null($searchText)) { return false; } else { return $text . ' '; }

and: added $this->searchFound as variable of class, currently my tests show positive results 👍

Adds functionality like: $coordinates = $page->getText(null, 'TEXT', false);

returns true or false if the text is present on current page

$coordinates = $page->getText(null, 'TEXT', true); returns the the coordinates in an array if present (if not returns false)

Total test script I am using:

       
        $location = 'xxx';
        $parser = new \Smalot\PdfParser\Parser();
        $pdf = $parser->parseFile($location);

        $pages = $pdf->getPages();
        $i = 1;
        foreach ($pages as $page)
        {
            if ($page->getText(null, 'TEXT')) {
                $coordinates = $page->getText(null, 'TEXT', true);

                echo 'Found it (in pspoints...) on x coordinates: ' . $coordinates['x'] . ' and y coordinates ' . $coordinates['y'] . ' on page ' . $i;
            }
            
            $i++;
        }

can u give me full code brother.. cz i get x and y is false not coordinates

rizkynich avatar Oct 31 '20 19:10 rizkynich

Currently using a loop to get the pages and get specific pages for text search. changed Page.php

line 183 to: public function getText(Page $page = null, $searchText = null, $returnAxes = false)

and line 221 to: return $contents->getText($this, $searchText, $returnAxes);

and PDFObject.php file,

line 252 to: public function getText(Page $page = null, $searchText = null, $returnAxes = false)

and added after line 337:

                            if (strpos($sub_text, $searchText) !== false) {
                                $this->searchFound = true;

                                if ($returnAxes) {
                                    return $current_position_tm;
                                    break;
                                } else {
                                    return true;
                                }
                            }
                        }

last added at replaced arround line 469 (return line): if (!$this->searchFound && !is_null($searchText)) { return false; } else { return $text . ' '; }

and: added $this->searchFound as variable of class, currently my tests show positive results 👍

Adds functionality like: $coordinates = $page->getText(null, 'TEXT', false);

returns true or false if the text is present on current page

$coordinates = $page->getText(null, 'TEXT', true); returns the the coordinates in an array if present (if not returns false)

Total test script I am using:

       
        $location = 'xxx';
        $parser = new \Smalot\PdfParser\Parser();
        $pdf = $parser->parseFile($location);

        $pages = $pdf->getPages();
        $i = 1;
        foreach ($pages as $page)
        {
            if ($page->getText(null, 'TEXT')) {
                $coordinates = $page->getText(null, 'TEXT', true);

                echo 'Found it (in pspoints...) on x coordinates: ' . $coordinates['x'] . ' and y coordinates ' . $coordinates['y'] . ' on page ' . $i;
            }
            
            $i++;
        }

This works fine for me, except the Y coordinate, it's out of range, can u help me with it?

maerko avatar Jan 17 '21 01:01 maerko

This works fine for me, except the Y coordinate, it's out of range, can u help me with it?

Can you please describe in more detail what do you mean.

k00ni avatar Jan 18 '21 08:01 k00ni

Currently using a loop to get the pages and get specific pages for text search. changed Page.php

line 183 to: public function getText(Page $page = null, $searchText = null, $returnAxes = false)

and line 221 to: return $contents->getText($this, $searchText, $returnAxes);

and PDFObject.php file,

line 252 to: public function getText(Page $page = null, $searchText = null, $returnAxes = false)

and added after line 337:

                            if (strpos($sub_text, $searchText) !== false) {
                                $this->searchFound = true;

                                if ($returnAxes) {
                                    return $current_position_tm;
                                    break;
                                } else {
                                    return true;
                                }
                            }
                        }

last added at replaced arround line 469 (return line): if (!$this->searchFound && !is_null($searchText)) { return false; } else { return $text . ' '; }

and: added $this->searchFound as variable of class, currently my tests show positive results 👍

Adds functionality like: $coordinates = $page->getText(null, 'TEXT', false);

returns true or false if the text is present on current page

$coordinates = $page->getText(null, 'TEXT', true); returns the the coordinates in an array if present (if not returns false)

Total test script I am using:

       
        $location = 'xxx';
        $parser = new \Smalot\PdfParser\Parser();
        $pdf = $parser->parseFile($location);

        $pages = $pdf->getPages();
        $i = 1;
        foreach ($pages as $page)
        {
            if ($page->getText(null, 'TEXT')) {
                $coordinates = $page->getText(null, 'TEXT', true);

                echo 'Found it (in pspoints...) on x coordinates: ' . $coordinates['x'] . ' and y coordinates ' . $coordinates['y'] . ' on page ' . $i;
            }
            
            $i++;
        }

could you apply as a pull request this code? because the code line number was changed.

seferdemirci avatar Apr 21 '21 20:04 seferdemirci

could you apply as a pull request this code?

Sorry, no. If you wanna propose new code or changes, please create a pull request. Or at least point to your fork which contains these changes in a separate branch. Its very hard to extract relevant code from plain text.

k00ni avatar Apr 23 '21 07:04 k00ni