essence
essence copied to clipboard
Automatically extract the main text content (and more) from an HTML document
Not sure whether I get it right. in the DocumentScorer.kt, I think the code here is using wrong judgement: ``` class DocumentScorer(private val stopWords: StopWords) : Scorer { override fun...
Bumps [kotlin-stdlib](https://github.com/JetBrains/kotlin) from 1.3.0 to 1.6.0. Release notes Sourced from kotlin-stdlib's releases. Kotlin 1.6.0 Changelog Android KT-48019 Bundle Kotlin Tooling Metadata into apk artifacts KT-47733 JVM / IR: Android Synthetic...
Bumps [jsoup](https://github.com/jhy/jsoup) from 1.11.3 to 1.14.2. Release notes Sourced from jsoup's releases. jsoup 1.14.2 Caught by the fuzz! jsoup 1.14.2 is out now, and includes a set of parser bug...
The tokenization logic should be more generic. Could use something like: https://www.atilika.org/ , to tokenize Japanese.
Hi, I was testing it with the Yahoo finance website, however, it's unable to get the text. For example, this post data is not parsed and I get the empty...
Bumps [junit](https://github.com/junit-team/junit4) from 4.12 to 4.13.1. Release notes Sourced from junit's releases. JUnit 4.13.1 Please refer to the release notes for details. JUnit 4.13 Please refer to the release notes...
Bumps [jsoup](https://github.com/jhy/jsoup) from 1.11.3 to 1.15.3. Release notes Sourced from jsoup's releases. jsoup 1.15.3 jsoup 1.15.3 is out now, and includes a security fix for potential XSS attacks, along with...
Is essence still maintained? It looks like essence has a problem with the content of the following pages: - https://www.business-standard.com/companies/news/city-gas-distributors-optimistic-about-long-term-growth-prospects-123091301205_1.html (essence result: Are you sure you want to Log out...
in the readme the link https://essence.mybluemix.net/ is mentioned for a "Try out the demo", this link is not working any more.