node-unfluff issues

Added ability to use library in webbrowser with requiring stopwords files

2

This is a PR of a commit from @cjanietz @cjanietz are you ok with me pulling this in?

Convert to front-end friendly, remove 'fs'

13

This module could pretty easily be converted to be front end friendly by removing the need for 'fs'. Since the stopwords files are the only things being accessed with 'fs',...

knod

Bad regex causing very slow execution

If you attempt to run unfluff on the body of the following webpage, https://craftsbyamanda.com/vibrant-button-tree-on-canvas-a-giveaway/ you'll see that it takes more than 10secs on a fast Mac. The problem has been...

kduffie

Can i use with utf 8 ?

![image](https://user-images.githubusercontent.com/22531754/95143467-837f5080-074c-11eb-80f3-c9ffc4cfc8fe.png) Content with accent is broken.

pedrosarkis

Parse Page Schema.org Data

1

Google recommends that pages include structured data schema: https://developers.google.com/search/docs/guides/intro-structured-data Specifically, I'm interested in ClaimReview data (https://schema.org/ClaimReview), but this structured data has significant overlap with the other data extracted by Unfluff...

ISNIT0

Author is not accurate

I've tested the package in using this [URL](https://www.digitalocean.com/community/tutorials/how-to-set-up-a-node-js-application-for-production-on-ubuntu-16-04). And this is the result: You can see that **author** property. ``` { data: { title: "How To Set Up a Node.js...

chan-dev

Any up to date alternatives?

4

Anybody know any up to date alternatives for this library?

Aditya94A

Problem with New York Times stories

2

The text field is empty when running unfluff on the html from a New York Times story. For example, if I request a story from nytimes.com in the node console...

gautamh

Add Bangla & Vietnamese stopwords

Hi, I am from Bangladesh and wanted scrap websites in Bangla. This library didn't had any Bangla stopwords, so I added them and test on few Bangla sites. The Vietnamese...

muiton