Google Code Exporter
Google Code Exporter
``` Why no tag in svn corresponding to the most recent release? ``` Original issue reported on code.google.com by `[email protected]` on 25 Jun 2012 at 3:33
``` What steps will reproduce the problem? - ArticleExtractor cannot process a web page having two parts (like the attached page) and results "java.lang.StackOverflowError". What is the expected output? What...
``` Hello, I have come across your API and it seems really impressive. Is there a way to parse the src URL of the main image in an Article? If...
``` Christian, We have a corpus that is a mixture of news articles and other web pages, some of which contain tables. The ArticleExtractor has trouble with many of these...
``` I'm trying to get Boilerpipe set up on Android. I'm using Eclipse Indigo and can build my project. As a test I am simply trying this: String response=""; try...
``` 1) Go to http://boilerpipe-web.appspot.com/ 2) Type in http://arstechnica.com/ as the URL. 3) Use article extractor and HTML (extract fragment) 4) See a nice list of articles on that page...
``` When using HTMLHighlighter some times boilerpipe keeps some artifacts related coming from FORM and LABEL tags. This can be easily prevented by addding a new ignorable element to TAG_ACTIONS...
``` I have run across a few news articles that use these characters. The following articles use the « character (\u00AB): http://philadelphia.cbslocal.com/2012/02/06/report-1-in-5-children-exposed-to-se condhand-smoke-in-cars/ http://blog.mediaglobal.org/?p=448 I haven't seen too many of...
``` • What steps will reproduce the problem? Get an html or htmlFragment from any page • What is the expected output? What do you see instead? The output have...
``` Now that HTML5 becomes more pervasive on the web, it might be worth considering additional parsing support in places, one example being the recently added image extractor. HTML5 includes...