crawler icon indicating copy to clipboard operation
crawler copied to clipboard

Flexible Auto-Retries for any kind of error responses (4xx, 5xx)

Open otsch opened this issue 2 years ago • 2 comments

As discussed in https://github.com/crwlrsoft/crawler/issues/99#issuecomment-1739671602 it would be nice to be able to use the RetryErrorResponseHandler differently. In a way that you're able to configure auto retries for any kind of error response. Not yet sure about the wait times implemented in the RetryErrorResponseHandler. They should probably only be used for the special error responses (429, 503). @ruerdev

otsch avatar Sep 28 '23 17:09 otsch

@otsch Good to know about the RetryErrorResponseHandler, I didn't know that. I think it will be very useful when we have more flexibility in how error responses are handled.

It might be a good idea to let users pick a shorter wait time when they get a 429 error while using proxies. As you will switch to a different IP for their next request.

ruerdev avatar Sep 29 '23 16:09 ruerdev

It might be a good idea to let users pick a shorter wait time when they get a 429 error

You can already customize the wait times, see https://www.crwlr.software/packages/crawler/v1.1/the-crawler/politeness#wait-and-retry I'll think about maybe automatically setting lower default wait times for those two error responses, when calling the new HttpLoader::useRotatingProxies() method 👍🏻

otsch avatar Sep 29 '23 23:09 otsch