boilerpipe icon indicating copy to clipboard operation
boilerpipe copied to clipboard

IllegalArgumentException for many web pages

Open GoogleCodeExporter opened this issue 11 years ago • 0 comments

With boilerpipe-1.2.0.jar
ArticleExtractor.INSTANCE.getText(new java.net.URL("http://t.co/3RplOLjc"))
produces
ERROR java.lang.IllegalArgumentException:
protocol = http host = null
        at de.l3s.boilerpipe.sax.HTMLFetcher.fetch (HTMLFetcher.java:33)
        at de.l3s.boilerpipe.extractors.ExtractorBase.getText (ExtractorBase.java:87)

This happens for many other URLs e.g. http://t.co/5vuYimwn http://t.co/Dy5yolLs 
http://t.co/ShWhtFjP http://nyti.ms/lQrWwp ...


Original issue reported on code.google.com by [email protected] on 22 Aug 2014 at 3:23

GoogleCodeExporter avatar Mar 24 '15 10:03 GoogleCodeExporter