Flexible Auto-Retries for any kind of error responses (4xx, 5xx)
As discussed in https://github.com/crwlrsoft/crawler/issues/99#issuecomment-1739671602 it would be nice to be able to use the RetryErrorResponseHandler differently. In a way that you're able to configure auto retries for any kind of error response.
Not yet sure about the wait times implemented in the RetryErrorResponseHandler. They should probably only be used for the special error responses (429, 503).
@ruerdev
@otsch Good to know about the RetryErrorResponseHandler, I didn't know that. I think it will be very useful when we have more flexibility in how error responses are handled.
It might be a good idea to let users pick a shorter wait time when they get a 429 error while using proxies. As you will switch to a different IP for their next request.
It might be a good idea to let users pick a shorter wait time when they get a 429 error
You can already customize the wait times, see https://www.crwlr.software/packages/crawler/v1.1/the-crawler/politeness#wait-and-retry
I'll think about maybe automatically setting lower default wait times for those two error responses, when calling the new HttpLoader::useRotatingProxies() method 👍🏻