google-auth-library-python icon indicating copy to clipboard operation
google-auth-library-python copied to clipboard

fix: retry Get for 500 and 503 error from GCE metadata server

Open baeminbo opened this issue 3 years ago • 4 comments

Currently, getting metadata retries only for a transport error, but doesn't retry for retryable status code.

GCE metadata doc suggests retrying for 503. In addition, GCE metadata server also returns 500 error for intermittent unavailability.

If this happens in token refresh, an intermittent 500 or 503 error is propagated as RefreshError. RefreshError is not retryable in python-api-core library. So, just one time of an intermittent retryable error with GCE metadata leads to GCP API call failure.

To mitigate this, I asked retry of RefreshError at [1], but the team suggested adding retry at auth layer [2].

[1] https://github.com/googleapis/python-api-core/issues/312 [2] https://github.com/googleapis/python-api-core/pull/313#issuecomment-978006491

baeminbo avatar Feb 19 '22 03:02 baeminbo

link with issue #980

arithmetic1728 avatar Mar 01 '22 21:03 arithmetic1728

#980

@arithmetic1728 The #980 is about token endpoint, while this change addresses retries to Metadata endpoint

TimurSadykov avatar Mar 04 '22 07:03 TimurSadykov

@arithmetic1728 i think we need first to address the #980 and add Retryable interface, then we can leverage that here to address Metadata retries. Most likely we will opt for retryable errors passed to client instead of actual retries in the library.

TimurSadykov avatar Mar 04 '22 07:03 TimurSadykov

@baeminbo Hi, could you, please, provide any stats on Metadata service errors that you are trying to mitigate?

TimurSadykov avatar Mar 08 '22 00:03 TimurSadykov