Extraction Issue
Hello @kohlschuetter ,
First off, I have to say, Boilerpipe is AMAZING! Thank you for your work on this.
In a few cases, I am having a bit of an extraction issue. With the github code, there are some articles where the extraction is starting late. For example, on https://en.wikipedia.org/wiki/New_York_City the output starts at "Further information: Police surveillance in New York City and Crime in New York City". However, when I check that same article on https://boilerpipe-web.appspot.com/, the web API is always getting the full text. I've been banging my head against the wall trying to figure out what I was doing wrong, and just figured I should message the inventor. The only two things I could think of are: 1) I am totally missing something or 2) the web api might slightly different version. Do you what might be going on here?
Hope you are having a great weekend!
Best, Kevin
I'm facing some issues with the ArticleExtractor producing completely different results for two pages that have really similar HTML:
https://www.posb.com.sg/personal/deposits/savings-accounts/emysavings-account
https://www.dbs.com.sg/personal/deposits/savings-accounts/mysavings-account
When I use the DefaultExtractor, the response is 96% similar. But using ArticleExtractor is completely different, any ideas why?