node-unfluff
node-unfluff copied to clipboard
Automatically extract body content (and other cool stuff) from an html document
This is a PR of a commit from @cjanietz @cjanietz are you ok with me pulling this in?
This module could pretty easily be converted to be front end friendly by removing the need for 'fs'. Since the stopwords files are the only things being accessed with 'fs',...
If you attempt to run unfluff on the body of the following webpage, https://craftsbyamanda.com/vibrant-button-tree-on-canvas-a-giveaway/ you'll see that it takes more than 10secs on a fast Mac. The problem has been...
 Content with accent is broken.
Google recommends that pages include structured data schema: https://developers.google.com/search/docs/guides/intro-structured-data Specifically, I'm interested in ClaimReview data (https://schema.org/ClaimReview), but this structured data has significant overlap with the other data extracted by Unfluff...
I've tested the package in using this [URL](https://www.digitalocean.com/community/tutorials/how-to-set-up-a-node-js-application-for-production-on-ubuntu-16-04). And this is the result: You can see that **author** property. ``` { data: { title: "How To Set Up a Node.js...
Anybody know any up to date alternatives for this library?
The text field is empty when running unfluff on the html from a New York Times story. For example, if I request a story from nytimes.com in the node console...
Hi, I am from Bangladesh and wanted scrap websites in Bangla. This library didn't had any Bangla stopwords, so I added them and test on few Bangla sites. The Vietnamese...