rubyretriever icon indicating copy to clipboard operation
rubyretriever copied to clipboard

Asynchronous Web Crawler & Scraper

Results 6 rubyretriever issues
Sort by recently updated
recently updated
newest added

# em-http-request ```[WARNING; em-http-request] TLS hostname validation is disabled (use 'tls: {verify_peer: true}'), see CVE-2020-13482 and https://github.com/igrigorik/em-http-request/issues/339 for details``` Is the message I'm always getting whenever I try creating a...

Also bumped the Ruby version requirements from a max of 2.3 to 2.6. Specs still pass under 2.6.

For example, from `http://example.com/file.html`: relative path `foo` resolves to `http://example.com/file.html/foo` but the correct behavior should be: `http://example.com/foo` I have written a test and possible fix here https://github.com/dezull/rubyretriever/commit/45114e313c9d73e43ce13cf61c687f65dedb18df

Not critical, but for the sake of least surprise ;) ``` Retriever::PageIterator.new(url, { 'maxpages' => 1 }) do |page| # Works Retriever::PageIterator.new(url, { maxpages: 1 }) do |page| # Does...

bug

it is sometimes useful to view the http status of the request when crawling a page - is there a way to do this? it wasn't immediately clear from the...

enhancement

issuing the following command results in an error ``` $ bundle exec rr --sitemap XML --progress http://www.yahoo.com /Users/lfender_mbp/source/rubyretriever/lib/retriever/cli.rb:10:in `initialize': private method `gen_xml' called for # (NoMethodError) from /Users/lfender_mbp/source/rubyretriever/bin/rr:75:in `new' from...