node-website-scraper
node-website-scraper copied to clipboard
Download website to local directory (including all css, images, js, etc.)
Quickly threw this together, it should work in theory and close #500. This is my last day before vacation so any additional work to get it merged in will need...
This closes #386 and is an extension of #496. This provides new options to store data in memory, in memory and compressed or on the filesystem. Unlikely #496 this would...
Hi Sophie, I noticed that some non-English (French) content end up not getting converted correctly. As a random example, try scraping this page: https://ensemblesurleterrain.bouyguestelecom.fr the result https://test-webscrapper.netlify.app You'll see that...
Now all pages are stored in memory (each resource content is stored in `Resource.text`) which cause high memory consumption. It would be nice to avoid storing `Resource.text` and save resourcess...
I recently had the need to set a specific option for Cheerio (`scriptingEnabled: false`) but there is currently no way to pass any configuration options. Does it make sense to...
So I'm scraping a site and generating integrity attributes but after returning the parsed body the integrity attributes are being striped. I've found at [resource-handler/html/index.js:55](https://github.com/website-scraper/node-website-scraper/blob/39f2ccb5f8860a1c8d1da260da0220d88b03b837/lib/resource-handler/html/index.js#L55) is the following section ```js...
maxFileSize would specify a resource maximum size, in bytes. If bigger than that, it should be ignored. eg ```maxFileSize: 500000``` would ignore files larger than 500k. Thanks!
👋 How do I change the style file name? On website: ```html ``` After parse: ```html ``` this file is not loaded and there are no styles on the site.
Bumps [got](https://github.com/sindresorhus/got) from 13.0.0 to 14.2.0. Release notes Sourced from got's releases. v14.2.0 Add cause property with the original error to RequestError (#2327) 4cbd01d https://github.com/sindresorhus/got/compare/v14.1.0...v14.2.0 v14.1.0 Allow typing the body...
The current logo is quite old, it would be nice to update it on GitHub and in the documentation. Current logo:  New logo requirements: * no color preference, but...