pluto handle / catch "HTTP redirect too deep" error / exception

This feed is updating and showing up on the planet... but I regularly get this error \ fail to build:

[info] found cache entry for >https://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >"f192fa56e685ca02fed139e8fec11c4d-ssl-df"< for conditional GET
[info] found cache entry for >https://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >"f192fa56e685ca02fed139e8fec11c4d-ssl-df"< for conditional GET
[info] found cache entry for >https://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >"f192fa56e685ca02fed139e8fec11c4d-ssl-df"< for conditional GET
[info] found cache entry for >https://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >"f192fa56e685ca02fed139e8fec11c4d-ssl-df"< for conditional GET
[info] found cache entry for >https://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >"f192fa56e685ca02fed139e8fec11c4d-ssl-df"< for conditional GET
[info] found cache entry for >https://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >"f192fa56e685ca02fed139e8fec11c4d-ssl-df"< for conditional GET

*** error: HTTP redirect too deep

##[error]Process completed with exit code 1.

Jun 22 '20 10:06 infominer33

Thanks for reporting. Weird - this might be a HTTP redirect that redirect to itself (thus, an endless loop). If I get to it I will try to check the HTTP headers for the HTTP status code (e.g. 3xx for redirect) and the redirect location.

Jun 23 '20 15:06 geraldb

Ideally it would just give up and move to the next feed after x times through the loop, and leave a note at the end or in an error log, rather than crashing.

I mentioned about the re-direct to the blog owner, but they said they had no other complaints about this, and since the feed items get populated ~90% of the runs, I don't know how much energy any of us need to pour into this particular infinite loop.

In any case.. I'm studying ruby pretty hard, as able, so hopefully sooner than later I'll have more to offer than bug reports. :)

Jun 24 '20 18:06 infominer33

Thanks for the update. I see than it must be a different error because it should give up on 7 tries or such an report an error and not crash. I try to check the link tomorrow to see if there's a HTTP redirect happening. Thanks for the patience.

Jun 24 '20 18:06 geraldb

Do you still get the error? If I try to fetch the feed (with pluto's http fetcher library) all works here. The test script:

require 'fetcher'


url = 'https://blog.bmannconsulting.com/feed.xml'

worker = Fetcher::Worker.new

response = worker.get( url )

puts response.code
puts response.message
puts response.content_type
puts response.body[0..100]

## try http  NOT https
url = 'http://blog.bmannconsulting.com/feed.xml'

response = worker.get( url )

puts response.code
puts response.message
puts response.content_type
puts response.body[0..100]

and the result:

[debug] fetch - get(_response) src: https://blog.bmannconsulting.com/feed.xml
[debug] using direct net http access; no proxy configured
[debug] GET /feed.xml uri=https://blog.bmannconsulting.com/feed.xml, redirect_limit=5
[debug] 200 OK
[debug]   content_type: application/xml, content_length: 9430
200
OK
application/xml
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
        xmlns:atom="http://www.w3.org/2005/Atom"

[debug] fetch - get(_response) src: http://blog.bmannconsulting.com/feed.xml
[debug] using direct net http access; no proxy configured
[debug] GET /feed.xml uri=http://blog.bmannconsulting.com/feed.xml, redirect_limit=5
[debug] 200 OK
[debug]   content_type: application/xml, content_length: 9430
200
OK
application/xml
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
        xmlns:atom="http://www.w3.org/2005/Atom"

Jun 25 '20 16:06 geraldb

I tried again with a cache entry from your log above. It also works. The script:

require 'fetcher'


url = 'https://blog.bmannconsulting.com/feed.xml'

worker = Fetcher::Worker.new

worker.use_cache = true
worker.cache[ url ] = {
    'etag' => 'f192fa56e685ca02fed139e8fec11c4d-ssl-df'
}

response = worker.get( url )

puts response.code
puts response.message
puts response.content_type
puts response.body[0..100]   if response.body

## try http  NOT https
url = 'http://blog.bmannconsulting.com/feed.xml'

worker.cache[ url ] = {
    'etag' => 'f192fa56e685ca02fed139e8fec11c4d-ssl-df'
}

response = worker.get( url )

puts response.code
puts response.message
puts response.content_type
puts response.body[0..100]    if response.body

resulting in

[debug] fetch - get(_response) src: https://blog.bmannconsulting.com/feed.xml
[debug] using direct net http access; no proxy configured
[debug] GET /feed.xml uri=https://blog.bmannconsulting.com/feed.xml, redirect_limit=5
[info] found cache entry for >https://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >f192fa56e685ca02fed139e8fec11c4d-ssl-df< for conditional GET
[debug] 304 Not Modified
304
Not Modified

[debug] fetch - get(_response) src: http://blog.bmannconsulting.com/feed.xml
[debug] using direct net http access; no proxy configured
[debug] GET /feed.xml uri=http://blog.bmannconsulting.com/feed.xml, redirect_limit=5
[info] found cache entry for >http://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >f192fa56e685ca02fed139e8fec11c4d-ssl-df< for conditional GET
[debug] 304 Not Modified
304
Not Modified

Anyways, please report back if you still get the error and what's the excact feed url in your planet.ini - maybe it's different?

Jun 25 '20 17:06 geraldb

ok, I tried replacing https for http in my ini... However, I can't reproduce the error reliably.

I've got it running hourly, (was running twice an hour), It fails to build only some of the time.. maybe 1-3x in 24 hours... in another day I should be able to tell you if that made a difference or not.

Jun 25 '20 17:06 infominer33

Thanks for the update. I you have traceback / stacktrace or some more error logs that would help. If you run pluto with --verbose you should get a more detailed error (if that's possible).

Jun 25 '20 17:06 geraldb

good idea! just at a brief glance I can see some other things I should be tracking in that output..

From now on my action uses verbose, so I can provide better info for errors... and I can also debug any feed issues.

Thanks!

Jun 25 '20 18:06 infominer33

okie here's the --verbose output

[debug] fetch - get(_response) src: http://blog.bmannconsulting.com/feed.xml
[debug] using direct net http access; no proxy configured
[debug] GET /feed.xml uri=http://blog.bmannconsulting.com/feed.xml, redirect_limit=5
[info] found cache entry for >http://blog.bmannconsulting.com/feed.xml<
[info] adding header If-None-Match (etag) >"f192fa56e685ca02fed139e8fec11c4d-ssl-df"< for conditional GET
[debug] 301 Moved Permanently location=https://blog.bmannconsulting.com/feed.xml
[debug] GET /feed.xml uri=https://blog.bmannconsulting.com/feed.xml, redirect_limit=4
[debug] 301 Moved Permanently location=https://blog.bmannconsulting.com/feed.xml
[debug] GET /feed.xml uri=https://blog.bmannconsulting.com/feed.xml, redirect_limit=3
[debug] 301 Moved Permanently location=https://blog.bmannconsulting.com/feed.xml
[debug] GET /feed.xml uri=https://blog.bmannconsulting.com/feed.xml, redirect_limit=2
[debug] 301 Moved Permanently location=https://blog.bmannconsulting.com/feed.xml
[debug] GET /feed.xml uri=https://blog.bmannconsulting.com/feed.xml, redirect_limit=1
[debug] 301 Moved Permanently location=https://blog.bmannconsulting.com/feed.xml
[debug] GET /feed.xml uri=https://blog.bmannconsulting.com/feed.xml, redirect_limit=0
[debug] 301 Moved Permanently location=https://blog.bmannconsulting.com/feed.xml

*** error: HTTP redirect too deep

##[error]Process completed with exit code 1.

Jun 26 '20 21:06 infominer33

Hello, wow - thanks for your diligence and help. Good to know that it is a HTTP redirect - somewhat weird why it loops forever. You might change the feed_url to use https (to avoid the redirect from http to https location).
On the pluto side I will add the HTTP redirect too deep error to the exception handle so that it will get logged but not exit and I try to double check if the protocol (http/https) gets maybe lost in the location header. Thanks again. Cheers.

Jun 29 '20 12:06 geraldb