feat: add parsing of response body for encoding
Quickly threw this together, it should work in theory and close #500.
This is my last day before vacation so any additional work to get it merged in will need to be picked up by someone else. @Jeremytijal ?
Changes:
- Adds parsing of response body to look for charset
-
Remove
package-lock.jsonfrom.gitignore - Increases ecmaversion in eslint to allow for optional chaining, it's an easy way to reduce cognitive complexity. It's supported in Node 14 and above
Currently on holiday so can't address changes at the moment but yes that makes sense.
Regarding the package-lock, how about the nightly is just updated to install with --package-lock=false? You get install consistency with nightly checks.
https://docs.npmjs.com/cli/v8/using-npm/config#package-lock
Yeah this looks cool so for example the site html I'm scraping looks like this:
and I end up with:
Which is the difference between:
And this:
Which I'm assuming this would fix?
Of course I could just pull the branch and check, duh...
@phawxby yes, --package-lock=false may work.
@marcfielding1 we expect some encoding issues to be fixed by this PR, especially when encoding is set inside html file in tag. But it would be nice if you check whether this branch fixes an issue for you
FYI, I scrapped https://tonclubtonmaillot.groupama.fr (I'm hostingthe result in [1]) from this PR, but go "�" instead of "à" in index page. I'll try to troubleshoot it when I have a chance
[1] https://test-node-website-scraper.netlify.app/
sorry @phawxby it's out of my skills 😕
I'm closing this PR because similar changes were merged in #504 and will be released in the next version in the next 1-2 days