polipus icon indicating copy to clipboard operation
polipus copied to clipboard

Polipus: distributed and scalable web-crawler framework

Results 13 polipus issues
Sort by recently updated
recently updated
newest added

I was able to crawl Unicode pages in 0.4.0 but after upgrading to 0.5.0 only some English characters would be in a crawled page. Please let me if there any...

I found that urls containing anchors like "#sku:123" (e.g a semi-colon) were not cleaned up when passed to the `to_absolute` method . As a consequence, they were escaped and added...

https://github.com/taganaka/polipusBundler could not find compatible versions for gem "bson": In snapshot (Gemfile.lock): bson (1.9.2) In Gemfile: mongoid (~> 4.0.0) ruby depends on moped (~> 2.0.0) ruby depends on bson (~>...

enhancement

Does it make sense to have support for headless crawling built-in to the framework? A lot of the websites these days are Single Page apps and crawling that using the...

When anchor links are found during the crawl (i.e http://www.example.com/abc.html#foo), they are encoded : the anchor tag is replaced with the escaped character %23, which causes the page to respond...

bug

For some reason I'm not able to install polipus gem on JRuby 1.7.13. Tried both Windows 8.0 and Ubuntu 12.04. Got the same error message. Gem::Installer::ExtensionBuildError: ERROR: Failed to build...

Hi! it seems one of our threads is stuck in an HTTP call. I think the function is: https://github.com/taganaka/polipus/blob/master/lib/polipus/http.rb#L170 It looks like the connection is never closed. Any idea what...

bug

As a result from #33, I reconsidered the current structure or `PolipusCrawler`. Especially `PolipusCrawler#takeover` is a very long method where lots is going on at the same time. `PolipusCrawler` itself...

When you try to resolve a domain which does not exist, polipus creates an error page with `SocketError`. Actually, the page does not exist anymore. So it's like a 404...

Current s3 implementation is partially broken and it doesn't work well under heavy load Providing a fog adapter looks a way better to me http://fog.io/storage/

enhancement