Improve Credential Resolution on GCE
Is your feature request related to a problem? Please describe.
We have intermittent failures when trying to run google.auth.default() on GKE. We get the following error, even though the metadata service will eventually come up:
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Describe the solution you'd like
The Golang client checks for the presence of GCE environment variables (i.e.GCE_METADATA_HOST) before trying to communicate with the service. This would be preferrable because we know that we are on GCE: https://code-review.googlesource.com/c/gocloud/+/5200
Describe alternatives you've considered
- Increase metadata ping timeout
In the failure cases, how long does it normally take for the metadata server to respond to the ping? The default timeout is 3 seconds, but it can be overwritten by the environment variable GCE_METADATA_TIMEOUT in Python. Please give it a try?
@wangyutongg
We aren't 100% sure how long it takes for the metadata server to respond to the ping, but we're pretty sure that the GKE metadata server is not ready & the connections are being refused. We think this because we've tried bumping the GCE_METADATA_TIMEOUT to something like 10 seconds, and still see the issue.
If we could configure the number of retries to be >3, that could also help, but that's just adding another knob to tune.
@ijrsvt Looks like not a timeout issue. When your workload starts to run, the metadata server is not ready. Increasing/configuring more retries does not sound like a good option. Let me do some investigation.