Requests for og being tagged as robot requests
Some sites recognize the requests from this module as robot requests and respond with a stripped down version of the page which is missing og data.
Examples: https://www.sciencedaily.com/releases/2016/03/160310082606.htm
I think setting a user agent in the requests may resolve this problem, but I'm not sure.
Hmm, interesting. Haven't thought about this. Is it good for the internet to pretend to not be a robot; is the module a robot or not. I guess it would depend on how it's used.
Maybe an option to set a user agent would help with this. I'll have to think about it.
Do we have a user agent defined for requests going through this?
I am working on a site which does not provide the og tags because the user agent is none in the listed below:
"googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|Viber|WhatsApp|Telegram"
Request to set a user agent from one of this, or perhaps a new one?
Until new APIs are decided to supplement the need to customize requests, you can make your own requests for the HTML using a library like request or axios. Once you have the HTML, you can pass that to the parse function:
og.parse(html)
This synchronous function will return the open graph meta data object without making any HTTP requests.