justext
justext copied to clipboard
A Go package that implements the JusText boilerplate removal algorithm
Hi there! 😊 This repo seems to depend on `github.com/levigross/exp-html` which doesn't ship a license file. This was identified in our CI pipeline using `github.com/google/go-licenses`. To me this looks like...
Document the source code and provide a useful set of examples. Update the ream-me. Use github project pages for coode explanation of the algorithm and examples of use.
``` func removeComments(root *html.Node) { var toBeRemoved []*html.Node var markRemovableNodes = func(node *html.Node) { if node.Type == html.CommentNode { toBeRemoved = append(toBeRemoved, node) } } nodeIter(root, markRemovableNodes) for _, node...
examples (showing filename: `grep ` output) ``` 512/3390ce13a50c7593b9ab6fcd539043ab: <style> .s9DpES {display: none; } </style> 512/3390ce13a50c7593b9ab6fcd539043ab: <style> .jsOffDisplayBlock { display: block; } .jsOffDisplayInline { display: inline; } .jsOffVisibility { visibility: visible;...
I believe the latest version of go-bindata takes different arguments and generates significantly different code than the code you have in defaultTemplate.go and detailedTemplate.go. you may want to consider updating...
This includes making the project installable via "go install github.com/JalfResi/gojustext"
Should stopword languages be in subpackages? Investigate (would speed up compile time!)