aws-sdk-java icon indicating copy to clipboard operation
aws-sdk-java copied to clipboard

Fail to retrieve token for high latency connection

Open craigsmithmsp opened this issue 5 years ago • 7 comments

If I connect using a high latency, satellite connection, the AWS SDK cannot retrieve a token. This problem started with 1.11.678. I have not found a configuration to increase the timeout for the underlying operation. Can one be added?

In my case, I have a simple Spring Boot application, using Spring Cloud, with AWS SQS. By default, that pulls in the 1.11.415 version. We had trouble with connections not getting properly closed and needed to upgrade AWS to prevent an open files leak. Although this was fixed, it introduced the token retrieval issue.

Stack trace

2020-06-17 09:08:22.518 level=WARN thread="pool-1-thread-16" c.a.i.InstanceMetadataServiceResourceFetcher - Fail to retrieve token com.amazonaws.SdkClientException: Failed to connect to service endpoint: at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100) at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.getToken(InstanceMetadataServiceResourceFetcher.java:91) at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.readResource(InstanceMetadataServiceResourceFetcher.java:69) at com.amazonaws.internal.EC2ResourceFetcher.readResource(EC2ResourceFetcher.java:66) at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsEndpoint(InstanceMetadataServiceCredentialsFetcher.java:58) at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.getCredentialsResponse(InstanceMetadataServiceCredentialsFetcher.java:46) at com.amazonaws.auth.BaseCredentialsFetcher.fetchCredentials(BaseCredentialsFetcher.java:112) at com.amazonaws.auth.BaseCredentialsFetcher.getCredentials(BaseCredentialsFetcher.java:68) at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:166) at com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper.getCredentials(EC2ContainerCredentialsProviderWrapper.java:75) at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1225) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1246) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) at com.amazonaws.services.sqs.AmazonSQSClient.doInvoke(AmazonSQSClient.java:2207) at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:2174) at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:2163) at com.amazonaws.services.sqs.AmazonSQSClient.executeReceiveMessage(AmazonSQSClient.java:1607) at com.amazonaws.services.sqs.AmazonSQSAsyncClient$14.call(AmazonSQSAsyncClient.java:1055) at com.amazonaws.services.sqs.AmazonSQSAsyncClient$14.call(AmazonSQSAsyncClient.java:1049) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.SocketException: Network is unreachable: connect at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1199) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984) at com.amazonaws.internal.ConnectionUtils.connectToEndpoint(ConnectionUtils.java:52) at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:80) ... 30 common frames omitted

Environment

  • AWS Java SDK version used: 1.11.791
  • JDK version used: 1.8
  • Operating System and version: Windows 10.0.18363

craigsmithmsp avatar Jun 17 '20 17:06 craigsmithmsp

Hi @craigsmithmsp the problem started in 1.11.678 because it's when the new Instance Metadata Service v2 was released, we have seen various reports of increased latency on the service side (like https://github.com/aws/aws-sdk-java/issues/2276 and https://github.com/aws/aws-sdk-java-v2/issues/1667).

Unfortunately is not possible to change the underlying connectionTimeout, I can mark this as a feature request if you'd like. You can also try to add a custom retry logic since the SDK won't retry IMDS credentials fetching.

debora-ito avatar Jun 20 '20 01:06 debora-ito

Thank you, @debora-ito . Please mark it as a feature request. The Spring cloud does repeatedly retry without success. I have noticed on our EC2 instances that we sometimes get it on startup but it retries and resolves quite reliably.

craigsmithmsp avatar Jun 20 '20 01:06 craigsmithmsp

HI all, I facing the same logging right now after updating aws-sdk to 1.11.807. Just for my understanding ... it's not a real problem right? Because I have a running local springboot-service which is fetching data from s3 and it works ... even if I see this logging. Would be ok to reduce the loglevel to ERROR?

ahoehma avatar Jul 07 '20 08:07 ahoehma

It appears that ConnectionUtils has hard-coded connect & read timeouts set to 1s: https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-core/src/main/java/com/amazonaws/internal/ConnectionUtils.java#L41

The Python SDK appears to obey an environment variable AWS_METADATA_SERVICE_TIMEOUT https://boto3.amazonaws.com/v1/documentation/api/1.9.42/guide/configuration.html#environment-variable-configuration but the Java SDK doesn't appear to have anything like that.

rehevkor5 avatar Jun 03 '21 16:06 rehevkor5

Hi! I talked about this issue and described our custom solution in this article.

ffeltrinelli avatar Apr 08 '22 18:04 ffeltrinelli

Any update on this issue? Facing the same Problem.

ghost avatar Nov 27 '22 18:11 ghost

Unfortunately is not possible to change the underlying connectionTimeout, I can mark this as a feature request if you'd like. You can also try to add a custom retry logic since the SDK won't retry IMDS credentials fetching.

Can this issue be closed? From looking at the latest code it looks like the java SDK now reads AWS_METADATA_SERVICE_TIMEOUT since 1.12.40, so the timeout is now configurable:

https://github.com/aws/aws-sdk-java/blob/8045d3dda6a4390516012fbc05ece5de13eba862/aws-java-sdk-core/src/main/java/com/amazonaws/internal/ConnectionUtils.java#L43-L65

sparrc avatar Mar 30 '23 16:03 sparrc