boilerpipe
boilerpipe copied to clipboard
Time out in HTMLFetcher
In HTMLDocument fetch(final URL url) there is no timeout. Ideally after creating final URLConnection conn = url.openConnection(); time out should be given. Please assign issue to me and I will send a pull request
+1, I was very dissapointed when it stuck at night.
try to set the proxy overriding the method
Fetch HTML separately, and feed it in via the BoilerpipeSaxInput - you can have your own timeouts and use a pipeline.