link-preview icon indicating copy to clipboard operation
link-preview copied to clipboard

Fetching data from Facebook

Open TWithers opened this issue 7 years ago • 0 comments

Awesome library! I like the simplicity of it and the ability to modify it with my own parsers and what not.

I noticed that the default parser doesn't fetch data from facebook due to the meta tags using "name" instead of "content"

A quick fix:

protected function parseHtml($html)
{
    $data = [
        'image' => '',
        'title' => '',
        'description' => ''
    ];

    libxml_use_internal_errors(true);
    $doc = new \DOMDocument();
    $doc->loadHTML($html);

    /** @var \DOMElement $meta */
    foreach ($doc->getElementsByTagName('meta') as $meta) {
        if($meta->hasAttribute('name')){
            $prop = 'name';
        }else{
            $prop = 'property';
        }
        if ($meta->getAttribute($prop) === 'image') {
            $data['image'] = $meta->getAttribute('content');
        } elseif ($meta->getAttribute($prop) === 'og:image') {
            $data['image'] = $meta->getAttribute('content');
        } elseif ($meta->getAttribute($prop) === 'twitter:image') {
            $data['image'] = $meta->getAttribute('value');
        }

        if ($meta->getAttribute($prop) === 'name') {
            $data['title'] = $meta->getAttribute('content');
        } elseif ($meta->getAttribute($prop) === 'og:title') {
            $data['title'] = $meta->getAttribute('content');
        } elseif ($meta->getAttribute($prop) === 'twitter:title') {
            $data['title'] = $meta->getAttribute('value');
        }

        if ($meta->getAttribute($prop) === 'description') {
            $data['description'] = $meta->getAttribute('content');
        }else if ($meta->getAttribute($prop) === 'og:description') {
            $data['description'] = $meta->getAttribute('content');
        }
    }

    if (empty($data['title'])) {
        /** @var \DOMElement $title */
        foreach ($doc->getElementsByTagName('title') as $title) {
            $data['title'] = $title->nodeValue;
        }
    }

    return $data;
}

I double checked this against this Go script which does the same thing: https://github.com/badoux/goscraper/blob/master/goscraper.go

This also fixes an issue where the title is overwritten by the description because of a typo, and disables the need to loop through the meta tags again to find a description if the meta tag attribute is 'name' instead of 'property'.

TWithers avatar Jul 18 '18 03:07 TWithers