node-website-scraper icon indicating copy to clipboard operation
node-website-scraper copied to clipboard

Pass options to Cheerio?

Open gmhenderson opened this issue 3 years ago • 4 comments

I recently had the need to set a specific option for Cheerio (scriptingEnabled: false) but there is currently no way to pass any configuration options. Does it make sense to create a config option that can be passed along to Cheerio?

gmhenderson avatar Feb 22 '22 23:02 gmhenderson

Hi @gmhenderson 👋

Sorry for late response.

Could you please share a use-case example when it can be needed? I didn't face such need before and I'm not sure about config option for cheerio because making cheerio more configurable will make it easier to break everything (website-scraper will not work as expected without current options for cheerio)

s0ph1e avatar Mar 30 '22 08:03 s0ph1e

Hi @s0ph1e, thank you for the response.

I am using this tool to create static HTML versions of CMS-powered websites that I have built. These websites load their CSS assets via Javascript (rather than within a <link> tag, with no-JS fallbacks contained within <noscript> tags. With scriptingEnabled: true the noscript tags are ignored and thus the fallback resource URLs are not scraped. One might think that the default scriptingEnabled value for Cheerio would be false, however it is not (see here ). Thus I had the need to be able to set its value.

As a workaround I was just about to fork website-scraper and hardcode the false config value. Here's where I made the change:

lib/resource-handler/html/index.js line 84:

return cheerio.load(text, { scriptingEnabled: false });

gmhenderson avatar Mar 31 '22 20:03 gmhenderson

Thanks for sharing @gmhenderson 👍

Yep, makes sense to have content inside <noscript> parsed. I'll leave the issue open and think about proper solution. Maybe it makes sense to have always { scriptingEnabled: false } 🤔

s0ph1e avatar Apr 01 '22 17:04 s0ph1e

I think always using { scriptingEnabled: false } makes sense since Javascript is not being parsed, but it seems like that could possibly break some existing projects.

gmhenderson avatar Apr 06 '22 15:04 gmhenderson

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 16 '22 22:11 stale[bot]