polipus issues

Unicode pages does not work anymore on 0.5.0

9

I was able to crawl Unicode pages in 0.4.0 but after upgrading to 0.5.0 only some English characters would be in a crawled page. Please let me if there any...

nengine

Edit regular expression in charge of removing anchor, simply add 'colon'

I found that urls containing anchors like "#sku:123" (e.g a semi-colon) were not cleaned up when passed to the `to_absolute` method . As a consequence, they were escaped and added...

ABrisset

Cannot use with mongoid ~> 4.0.0

5

https://github.com/taganaka/polipusBundler could not find compatible versions for gem "bson": In snapshot (Gemfile.lock): bson (1.9.2) In Gemfile: mongoid (~> 4.0.0) ruby depends on moped (~> 2.0.0) ruby depends on bson (~>...

nengine

enhancement

Support for headless crawling

Does it make sense to have support for headless crawling built-in to the framework? A lot of the websites these days are Single Page apps and crawling that using the...

sandeepravi

Anchor links converted to %23 causing 404 errors

When anchor links are found during the crawl (i.e http://www.example.com/abc.html#foo), they are encoded : the anchor tag is replaced with the escaped character %23, which causes the page to respond...

ABrisset

bug

Cannot install on JRuby 1.7.13. Error with bson_ext-1.9.2

9

For some reason I'm not able to install polipus gem on JRuby 1.7.13. Tried both Windows 8.0 and Ubuntu 12.04. Got the same error message. Gem::Installer::ExtensionBuildError: ERROR: Failed to build...

nengine

Thread seems to hang in HTTP Call

9

Hi! it seems one of our threads is stuck in an HTTP call. I think the function is: https://github.com/taganaka/polipus/blob/master/lib/polipus/http.rb#L170 It looks like the connection is never closed. Any idea what...

hendricius

bug

Refactor Crawler

6

As a result from #33, I reconsidered the current structure or `PolipusCrawler`. Especially `PolipusCrawler#takeover` is a very long method where lots is going on at the same time. `PolipusCrawler` itself...

tmaier

SocketError could mean, domain is gone or no internet connection

When you try to resolve a domain which does not exist, polipus creates an error page with `SocketError`. Actually, the page does not exist anymore. So it's like a 404...

tmaier

Kill s3 entirely, use Fog, yo!

Current s3 implementation is partially broken and it doesn't work well under heavy load Providing a fog adapter looks a way better to me http://fog.io/storage/

taganaka

enhancement

polipus
polipus copied to clipboard

Metadata

Unicode pages does not work anymore on 0.5.0

Edit regular expression in charge of removing anchor, simply add 'colon'

Cannot use with mongoid ~> 4.0.0

Support for headless crawling

Anchor links converted to %23 causing 404 errors

Cannot install on JRuby 1.7.13. Error with bson_ext-1.9.2

Thread seems to hang in HTTP Call

Refactor Crawler

SocketError could mean, domain is gone or no internet connection

Kill s3 entirely, use Fog, yo!

← Metadata

Owner

Metadata

polipus polipus copied to clipboard

Metadata

← Metadata

Owner

Metadata

polipus
polipus copied to clipboard