Pass options to Cheerio?
I recently had the need to set a specific option for Cheerio (scriptingEnabled: false) but there is currently no way to pass any configuration options. Does it make sense to create a config option that can be passed along to Cheerio?
Hi @gmhenderson 👋
Sorry for late response.
Could you please share a use-case example when it can be needed? I didn't face such need before and I'm not sure about config option for cheerio because making cheerio more configurable will make it easier to break everything (website-scraper will not work as expected without current options for cheerio)
Hi @s0ph1e, thank you for the response.
I am using this tool to create static HTML versions of CMS-powered websites that I have built. These websites load their CSS assets via Javascript (rather than within a <link> tag, with no-JS fallbacks contained within <noscript> tags. With scriptingEnabled: true the noscript tags are ignored and thus the fallback resource URLs are not scraped. One might think that the default scriptingEnabled value for Cheerio would be false, however it is not (see here ). Thus I had the need to be able to set its value.
As a workaround I was just about to fork website-scraper and hardcode the false config value. Here's where I made the change:
lib/resource-handler/html/index.js line 84:
return cheerio.load(text, { scriptingEnabled: false });
Thanks for sharing @gmhenderson 👍
Yep, makes sense to have content inside <noscript> parsed. I'll leave the issue open and think about proper solution.
Maybe it makes sense to have always { scriptingEnabled: false } 🤔
I think always using { scriptingEnabled: false } makes sense since Javascript is not being parsed, but it seems like that could possibly break some existing projects.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.