envoy icon indicating copy to clipboard operation
envoy copied to clipboard

safe retry on reset

Open rgs1 opened this issue 5 years ago • 6 comments

We currently support retrying after an upstream reset, but this could happen after the request has been seen -- partially or totally -- by the upstream:

https://github.com/envoyproxy/envoy/blob/master/source/common/router/router.cc#L966

at which point, it might be unsafe to retry. Note only downstream_response_started_ is checked for, we are not checking if the request was sent out already.

Would something like reset-before-request make sense? We currently have this issue, where we have an upstream that closes idle connections just before we start sending the request. But it can't currently be retried, because it would only be safe to retry on connect-failure.

cc: @mattklein123 @derekargueta @fishcakez

rgs1 avatar Feb 11 '20 18:02 rgs1

cc @snowp for guidance/thoughts on retry policies.

mattklein123 avatar Feb 12 '20 04:02 mattklein123

Conceptually this makes sense to me, though we'll need to be specific about the exact behavior: for example, if the first attempt was sent and was retried due to response headers, but on the second attempt the request is reset before the second set of headers are sent, should the second request be retried?

This definitely falls into a similar category as https://github.com/envoyproxy/envoy/issues/9946 wrt having too many retry policies, but I think this one is probably simply enough that we can support it

snowp avatar Feb 13 '20 00:02 snowp

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 15 '20 17:03 stale[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

stale[bot] avatar Mar 22 '20 17:03 stale[bot]

Sorry to rekindle an old issue but this would really be useful!

See (long) threads like: https://github.com/envoyproxy/envoy/issues/14981; where people deal with 503's. Sending local reply with details upstream_reset_before_response_started{connection_termination}

These are "fixed" by a retry policy of reset or gateway-error, however that can be dangerous. All the resets I observe certainly fall into reset-before-request, so feels like this would be a big win.

Stono avatar Jun 25 '24 09:06 Stono

I'd be interested in potentially implementing this

keithmattix avatar Jun 25 '24 14:06 keithmattix