openGraphScraper icon indicating copy to clipboard operation
openGraphScraper copied to clipboard

Unhandled JSON.parse errors

Open cjroebuck opened this issue 1 year ago • 2 comments

Describe the bug JSON.parse errors are not handled correctly, causing error result to be thrown rather than returned in the Promise.

To Reproduce Try it on the html output of for e.g. https://www.headlightsdepot.com because one of the JSON+LD script tags in there has line breaks which causes JSON.parse to throw.

Expected behavior Expected to be able to use the error result object to handle the error, but had to wrap the call to await ogs in try/catch and handle the error result in the catch branch.

Actual behavior The call to await ogs({html}) throws the ErrorResult

ogs version: 6.8.1

cjroebuck avatar Sep 05 '24 23:09 cjroebuck

I'm getting UND_ERR_HEADERS_OVERFLOW( HeadersOverflowError: Headers Overflow Error) errors from this page. Are these the errors you are seeing? If yes, you will need to open a issue on undici.

Example code:

const { fetch } = require('undici');

const getAPI = async () => {
  const request = await fetch('https://www.headlightsdepot.com');
  const text = await request.text();
  console.log('text:', text);
};

getAPI();

jshemas avatar Sep 05 '24 23:09 jshemas

No,i'm seeing this error, but I'm running this from within a headless browser, where i've already loaded the URL, so I am passing the html option rather than the url:

{
  success: false,
  requestUrl: undefined,
  error: 'Bad control character in string literal in JSON at position 1004',
  errorDetails: SyntaxError: Bad control character in string literal in JSON at position 1004
      at JSON.parse (<anonymous>)
      at Element.<anonymous> (/Users/cjr/dev/node_modules/open-graph-scraper/dist/esm/lib/extract.js:105:30)
      at LoadedCheerio.each (/Users/cjr/dev/node_modules/cheerio/lib/api/traversing.js:519:26)
      at extractMetaTags (/Users/cjr/dev/node_modules/open-graph-scraper/dist/esm/lib/extract.js:97:21)
      at setOptionsAndReturnOpenGraphResults (/Users/cjr/dev/node_modules/open-graph-scraper/dist/esm/lib/openGraphScraper.js:24:48)
      at run (/Users/cjr/dev/node_modules/open-graph-scraper/dist/esm/index.js:26:56)
}

To be honest I think the confusion I had was that the type of the main ogs function is Promise<SuccessResult | ErrorResult> but actually I can see that if there is an ErrorResult you're throwing it, so its not quite the same thing as the return type suggests?

cjroebuck avatar Sep 06 '24 08:09 cjroebuck

Hello, thanks for the full error. This should be fixed in [email protected].

jshemas avatar Sep 18 '24 23:09 jshemas

Closing this for now. Please open a new ticket if you see any other issues.

jshemas avatar Oct 05 '24 20:10 jshemas