Retry AWS credential load errors
A user reported that Vector was failing to start because it couldn't load AWS credentials in the aws_s3 sink. They run a proxy sidecar for these credential requests, and Vector was starting up before the sidecar does. It seems reasonable to retry loading of credentials indefinitely.
Sep 21 17:16:19.191 ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=Failed creating AWS credentials. Errors: [CredentialsError { message: "Error during dispatch: error trying to connect: tcp connect error: Connection refused (os error 111)" }, CredentialsError { message: "Couldn't find AWS credentials in environment, credentials file, or IAM role." }] component_kind="sink" component_type="aws_s3" component_id=s3_systemd component_name=s3_systemd
We have the same issue but this time for SQS,
sink{component_id=sqs component_kind="sink" component_type=aws_sqs component_name=sqs}:request{request_id=74063}: vector::sinks::util::retries: Non-retriable error; dropping the request. error=Failed creating AWS credentials. Errors: [CredentialsError { message: "environment variable not found" }, CredentialsError { message: "Couldn't find AWS credentials in environment, credentials file, or IAM role." }]
Given that the access to SQS is granted from AWS IAM.
We've kind of same issue with Elasticsearch sink connected to AWS Opensearch.
We notice that time to time and at some point in time when log volume spikes, credential provider fails and vector drops messages even end to end acknowledgments are enabled, so missing the contract about deliver with guarantee.
When enabled debug logs, I notice that credentials were loaded every single second, is that a normal behaviour? Maybe we can specify a longer credential expiration time when use the aws-sdk?
Notes:
- Vector version 0.37.0
- Running at AWS ECS Fargate using
EcsContainercredential provider. - Reading from Kafka Source
- Tested with
imdsauth configuration increase timeouts and max_attempts to try to avoid this error, but still same behave:auth: strategy: "aws" load_timeout_secs: 1200 imds: max_attempts: 60 connect_timeout_seconds: 10 read_timeout_seconds: 10
In order to not duplicate a lot, I create this support question at discord: discord thread
Example of logs:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| timestamp | message |
|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 1711549083590 | 2024-03-27T14:18:03.590844Z WARN sink{component_kind="sink" component_id=opensearch component_type=elasticsearch}:request{request_id=4418}: aws_config::meta::credentials::chain: provider failed to provide credentials provider=EcsContainer error=unexpected credentials error: dispatch failure: other: connection closed before message completed (Unhandled(Unhandled { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(Some(TransientError)), source: hyper::Error(IncompleteMessage), connection: Unknown } }) })) |
| 1711549083591 | 2024-03-27T14:18:03.591030Z ERROR sink{component_kind="sink" component_id=opensearch component_type=elasticsearch}:request{request_id=4418}: vector::internal_events::common: Failed to build request. error=unexpected credentials error error_type="encoder_failed" stage="processing" internal_log_rate_limit=true |
| 1711549083591 | 2024-03-27T14:18:03.591146Z WARN sink{component_kind="sink" component_id=opensearch component_type=elasticsearch}:request{request_id=4418}: vector::sinks::util::adaptive_concurrency::controller: Unhandled error response. error=unexpected credentials error internal_log_rate_limit=true |
| 1711549083591 | 2024-03-27T14:18:03.591218Z ERROR sink{component_kind="sink" component_id=opensearch component_type=elasticsearch}:request{request_id=4418}: vector::sinks::util::retries: Unexpected error type; dropping the request. error=unexpected credentials error internal_log_rate_limit=true |
| 1711549083591 | 2024-03-27T14:18:03.591354Z ERROR sink{component_kind="sink" component_id=opensearch component_type=elasticsearch}:request{request_id=4418}: vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=Some(Unhandled(Unhandled { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Other(Some(TransientError)), source: hyper::Error(IncompleteMessage), connection: Unknown } }) })) request_id=4418 error_type="request_failed" stage="sending" internal_log_rate_limit=true |
| 1711549083591 | 2024-03-27T14:18:03.591722Z ERROR sink{component_kind="sink" component_id=opensearch component_type=elasticsearch}:request{request_id=4418}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1007 reason="Service call failed. No retries or retries exhausted." internal_log_rate_limit=true |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------