search-engine-google icon indicating copy to clipboard operation
search-engine-google copied to clipboard

Google DOM Change?

Open OmarMonterrey opened this issue 5 years ago • 8 comments

URL: https://www.google.com/search?q=download+youtube+thumbnail Expected: Correct parsing What I'm getting: Unable to check javascript status. Google DOM has possibly changed and an update may be required. The HTML is OK and I have composer completly up to date; I'm attaching HTML screenshot and content image

invalid_dom.zip

OmarMonterrey avatar Sep 11 '20 13:09 OmarMonterrey

Happened to me as well. The SERPS implementation for Google is not able to parse HTML correctly. Please, fix it ASAP.

alexgarciab avatar Sep 11 '20 15:09 alexgarciab

Yes, it looks like Google DOM has changed.

Since the below function in the package looks for the "class", and it returns null, all the functions that use javascriptIsEvaluated() breaks. For example: getNaturalResults and getAdwordsResults

public function javascriptIsEvaluated()
{
    $body = $this->getXpath()->query('//body');

    if ($body->length != 1) {
        throw new Exception('No body found');
    }

    $body = $body->item(0);
    /** @var $body \DOMElement */
    $class = $body->getAttribute('class');

    if ($class=='hsrp') {
        return false;
    } elseif (strstr($class, 'srp')) {
        return true;
    } else {
        throw new InvalidDOMException('Unable to check javascript status.');
    }
}

Do you have a plan about solving this issue?

Thank you

kucugum avatar Sep 12 '20 11:09 kucugum

Yes, it looks like Google DOM has changed.

Since the below function in the package looks for the "class", and it returns null, all the functions that use javascriptIsEvaluated() breaks. For example: getNaturalResults and getAdwordsResults

public function javascriptIsEvaluated()
{
    $body = $this->getXpath()->query('//body');

    if ($body->length != 1) {
        throw new Exception('No body found');
    }

    $body = $body->item(0);
    /** @var $body \DOMElement */
    $class = $body->getAttribute('class');

    if ($class=='hsrp') {
        return false;
    } elseif (strstr($class, 'srp')) {
        return true;
    } else {
        throw new InvalidDOMException('Unable to check javascript status.');
    }
}

Do you have a plan about solving this issue?

Thank you

You were right, the issue were right there but the body tag has the proper attributes, since I'm only using "getNaturalResults", I implemented a little hack; $html = preg_replace('/^.*?(<body)/is','$1', $html); Basically I removed all before <body tag, that way the DOM is parsed as expected and the classes are checked, so it's working for me now.

OmarMonterrey avatar Sep 12 '20 16:09 OmarMonterrey

Thank you, it works as a temporary fix. I hope the package will get an update about this for a permanent fix.

kucugum avatar Sep 14 '20 09:09 kucugum

So I have talked with the developer of this library. He told me that he does not have the time to maintain the library, so there won't be any updates from now sadly. 🙃

alexgarciab avatar Sep 14 '20 15:09 alexgarciab

So I have talked with the developer of this library. He told me that he does not have the time to maintain the library, so there won't be any updates from now sadly.

This explains a lot of pull request being "ignored"...

pedropamn avatar Sep 14 '20 22:09 pedropamn

The DOM to get the number of results has changed too. I applied @OmarMonterrey 's hack:

// in vendor/serps/core/src/Core/Http/SearchEngineResponse.php
    public function getPageContent()
    {
        $this->pageContent = preg_replace('/^.*?(<body)/is','$1', $this->pageContent);
        return $this->pageContent;
    }

And changed this to get the number of results:

// in vendor/serps/search-engine-google/src/Page/GoogleSerp.php
    public function getNumberOfResults()
    {
        $item = $this->cssQuery('#result-stats');
        // ... etc
    }

migliori avatar Sep 16 '20 04:09 migliori

The DOM to get the number of results has changed too. I applied @OmarMonterrey 's hack:

// in vendor/serps/core/src/Core/Http/SearchEngineResponse.php
    public function getPageContent()
    {
        $this->pageContent = preg_replace('/^.*?(<body)/is','$1', $this->pageContent);
        return $this->pageContent;
    }

And changed this to get the number of results:

// in vendor/serps/search-engine-google/src/Page/GoogleSerp.php
    public function getNumberOfResults()
    {
        $item = $this->cssQuery('#result-stats');
        // ... etc
    }

I've been running the following for about a year now and it's kept this change at bay:


    /**
 // in vendor/serps/search-engine-google/src/Page/GoogleSerp.php
     * Get the total number of results available for the search terms
     * @return int the number of results
     * @throws InvalidDOMException
     */
    public function getNumberOfResults()
    {
        $item = $this->cssQuery('#resultStats');

        if ($item->length < 1) {

            $item = $this->cssQuery('#result-stats');

            if ($item->length < 1) {
                return null;
            }
        }

LunarDevelopment avatar Sep 21 '20 13:09 LunarDevelopment