boilerpipe icon indicating copy to clipboard operation
boilerpipe copied to clipboard

Automatically exported from code.google.com/p/boilerpipe

Results 46 boilerpipe issues
Sort by recently updated
recently updated
newest added

``` I just interested to know if a block has been removed, what's the reason? As I see in the source code, each block is labelled for different conditions. How...

Type-Defect
Priority-Medium
auto-migrated

``` The result of a same page is different with the web api. For example consider the following link: http://boilerpipe-web.appspot.com/extract?url=http%3A%2F%2F1tajrobeh.blog.ir%2F& extractor=ArticleExtractor&output=html&extractImages= I used ArticleExtractor in version 1.2.0 but the result...

Type-Defect
Priority-Medium
auto-migrated

``` Hi i am new to using this extractor while i am trying to run as simple extractor using only the boilerpipe-1.2.1.jar i am getting a unsupported Content type error....

Type-Defect
Priority-Medium
auto-migrated

``` What steps will reproduce the problem? 1. if boilerpipe is at a higher precedence than CyberNeko library, then it will cause parsing issue on user input with unbalanced tags...

Type-Defect
Priority-Medium
auto-migrated

``` What steps will reproduce the problem? 1. call ArticleExtractor.getInstance().getText() on the example data (Stability.html) What is the expected output? What do you see instead? The extraction takes a very...

Type-Defect
Priority-Medium
auto-migrated

``` What steps will reproduce the problem? 1. Missing de.l3s.boilerpipe.sax.ImageExtractor What is the expected output? What do you see instead? Rebuilding jar from source has the missing de.l3s.boilerpipe.sax.ImageExtractor class file....

Type-Defect
Priority-Medium
auto-migrated

``` With boilerpipe-1.2.0.jar ArticleExtractor.INSTANCE.getText(new java.net.URL("http://t.co/3RplOLjc")) produces ERROR java.lang.IllegalArgumentException: protocol = http host = null at de.l3s.boilerpipe.sax.HTMLFetcher.fetch (HTMLFetcher.java:33) at de.l3s.boilerpipe.extractors.ExtractorBase.getText (ExtractorBase.java:87) This happens for many other URLs e.g. http://t.co/5vuYimwn http://t.co/Dy5yolLs http://t.co/ShWhtFjP...

Type-Defect
Priority-Medium
auto-migrated

``` What steps will reproduce the problem? 1. extract content from the page (in Chinese) with ArticleExtractor http://www.ccgp.gov.cn/cggg/zybx/zbgg/201407/t20140731_3655909.shtml What is the expected output? What do you see instead? Footnote is...

Type-Defect
Priority-Medium
auto-migrated

``` What steps will reproduce the problem? 1.Give the URL as : http://www.newyorker.com/news/amy-davidson/shattered-school-gaza-2 2.Keep the extractor strategy as artcle extractor 3.Extract What is the expected output? What do you see...

Type-Defect
Priority-Medium
auto-migrated

``` What steps will reproduce the problem? 1.String content = CommonExtractors.DEFAULT_EXTRACTOR.getText(new URL("http://www.nytimes.com/2014/06/06/business/gm-ignition-switch-internal-reca ll-investigation-report.html?hp")); 2.System.out.println(content); 3.It prints nothing When I run with the above URL, its not extracting anything. I have...

Type-Defect
Priority-Medium
auto-migrated