cloudwatch-agent/cloudwatch-agent:1.300028.1b210 breaks Fargate integration
The latest image (cloudwatch-agent/cloudwatch-agent:1.300028.1b210) broke my previously working Fargate instance. I include it in my SAM template task definition.
- Name: cloudwatch-agent
Image: "public.ecr.aws/cloudwatch-agent/cloudwatch-agent:latest"
Secrets:
- Name: CW_CONFIG_CONTENT
ValueFrom: !Ref CWAgentConfigSecret
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref StatsLogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: ecs
Essential: true
Environment:
- Name: AWS_EMF_ENVIRONMENT
Value: ECS
- Name: AWS_EMF_SERVICE_TYPE
Value: ECS
- Name: AWS_EMF_LOG_GROUP_NAME
Value: !Ref StatsLogGroup
- Name: AWS_EMF_NAMESPACE
Value: "App/Name"
My secret is just the default:
{"agent":{"log_level":"INFO"},"logs":{"metrics_collected":{"emf":{}}}}
It worked fine with the previous version (1.300026.3b189), but the new version can't send the data. Reverting and fixing the previous version (not latest) resolved the issues.
I get these logs:
September 27, 2023 at 17:09 (UTC-4:00) 2023/09/27 21:09:22 D! should retry true for imds error : RequestError: send request failed cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) 2023/09/27 21:09:22 D! could not get instance document without imds v1 fallback enable thus enable fallback cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) E! [EC2] Fetch hostname from EC2 metadata fail: RequestError: send request failed cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) caused by: Get "http://169.254.169.254/latest/meta-data/hostname": dial tcp 169.254.169.254:80: connect: invalid argument cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) 2023/09/27 21:09:22 D! should retry true for imds error : RequestError: send request failed cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) 2023/09/27 21:09:21 D! should retry true for imds error : RequestError: send request failed cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) 2023/09/27 21:09:21 D! could not get hostname without imds v1 fallback enable thus enable fallback cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) D! [EC2] Found active network interface cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) 2023/09/27 21:09:21 I! imds retry client will retry 1 times cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) 2023/09/27 21:09:21 D! should retry true for imds error : RequestError: send request failed cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00) caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument cloudwatch-agent
How can I get it to fetch a token?
Thank you for bringing this issue to our attention. We are trying to reproduce.
I'm assuming what you posted is not the full agent log. Is that correct?
In fargate you will not be able to get the imds token. It makes sense to see these error logs. We added extra error logging in this version for getting the imds token.
Can you please use this config
{
"agent": {
"debug": true
},
"logs": {
"metrics_collected": {
"emf": {}
}
}
}
and post the full agent log.
2023-10-09T11:31:59.801+03:00 | D! [EC2] Found active network interface
-- | --
| 2023-10-09T11:31:59.806+03:00 | 2023/10/09 08:31:59 I! imds retry client will retry 1 times
| 2023-10-09T11:31:59.807+03:00 | 2023/10/09 08:31:59 D! should retry true for imds error : RequestError: send request failed
| 2023-10-09T11:31:59.807+03:00 | caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument
| 2023-10-09T11:31:59.895+03:00 | 2023/10/09 08:31:59 D! should retry true for imds error : RequestError: send request failed
| 2023-10-09T11:31:59.895+03:00 | caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument
| 2023-10-09T11:31:59.895+03:00 | 2023/10/09 08:31:59 D! could not get hostname without imds v1 fallback enable thus enable fallback
| 2023-10-09T11:32:00.598+03:00 | E! [EC2] Fetch hostname from EC2 metadata fail: RequestError: send request failed
| 2023-10-09T11:32:00.598+03:00 | caused by: Get "http://169.254.169.254/latest/meta-data/hostname": dial tcp 169.254.169.254:80: connect: invalid argument
| 2023-10-09T11:32:00.599+03:00 | 2023/10/09 08:32:00 D! should retry true for imds error : RequestError: send request failed
| 2023-10-09T11:32:00.599+03:00 | caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument
| 2023-10-09T11:32:00.630+03:00 | 2023/10/09 08:32:00 D! should retry true for imds error : RequestError: send request failed
| 2023-10-09T11:32:00.630+03:00 | caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument
| 2023-10-09T11:32:00.630+03:00 | 2023/10/09 08:32:00 D! could not get instance document without imds v1 fallback enable thus enable fallback
| 2023-10-09T11:32:01.406+03:00 | E! [EC2] Fetch identity document from EC2 metadata fail: EC2MetadataRequestError: failed to get EC2 instance identity document
| 2023-10-09T11:32:01.406+03:00 | caused by: RequestError: send request failed
| 2023-10-09T11:32:01.406+03:00 | caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument
| 2023-10-09T11:32:01.407+03:00 | 2023/10/09 08:32:01 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
| 2023-10-09T11:32:01.427+03:00 | I! Detected the instance is ECS
| 2023-10-09T11:32:01.429+03:00 | 2023/10/09 08:32:01 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
| 2023-10-09T11:32:01.429+03:00 | /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
| 2023-10-09T11:32:01.430+03:00 | Cannot access /etc/cwagentconfig: lstat /etc/cwagentconfig: no such file or directory
| 2023-10-09T11:32:01.430+03:00 | 2023/10/09 08:32:01 unable to scan config dir /etc/cwagentconfig with error: lstat /etc/cwagentconfig: no such file or directory
| 2023-10-09T11:32:01.431+03:00 | 2023/10/09 08:32:01 Reading json config from from environment variable CW_CONFIG_CONTENT.
| 2023-10-09T11:32:01.598+03:00 | 2023/10/09 08:32:01 I! Valid Json input schema.
| 2023-10-09T11:32:01.598+03:00 | I! Trying to detect region from ec2
| 2023-10-09T11:32:01.599+03:00 | I! Trying to detect region from ecs
| 2023-10-09T11:32:01.600+03:00 | 2023/10/09 08:32:01 D! pipeline hostDeltaMetrics has no receivers
| 2023-10-09T11:32:01.601+03:00 | 2023/10/09 08:32:01 Configuration validation first phase succeeded
| 2023-10-09T11:32:01.604+03:00 | 2023/10/09 08:32:01 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
| 2023-10-09T11:32:01.604+03:00 | 2023/10/09 08:32:01 D! config [agent]
| 2023-10-09T11:32:01.604+03:00 | collection_jitter = "0s"
| 2023-10-09T11:32:01.604+03:00 | debug = true
| 2023-10-09T11:32:01.604+03:00 | flush_interval = "1s"
| 2023-10-09T11:32:01.604+03:00 | flush_jitter = "0s"
| 2023-10-09T11:32:01.604+03:00 | hostname = ""
| 2023-10-09T11:32:01.604+03:00 | interval = "60s"
| 2023-10-09T11:32:01.604+03:00 | logfile = ""
| 2023-10-09T11:32:01.604+03:00 | logtarget = "lumberjack"
| 2023-10-09T11:32:01.604+03:00 | metric_batch_size = 1000
| 2023-10-09T11:32:01.604+03:00 | metric_buffer_limit = 10000
| 2023-10-09T11:32:01.604+03:00 | omit_hostname = true
| 2023-10-09T11:32:01.604+03:00 | precision = ""
| 2023-10-09T11:32:01.604+03:00 | quiet = false
| 2023-10-09T11:32:01.604+03:00 | round_interval = false
| 2023-10-09T11:32:01.604+03:00 | [inputs]
| 2023-10-09T11:32:01.604+03:00 | [[inputs.statsd]]
| 2023-10-09T11:32:01.604+03:00 | interval = "10s"
| 2023-10-09T11:32:01.604+03:00 | parse_data_dog_tags = true
| 2023-10-09T11:32:01.604+03:00 | service_address = ":8125"
| 2023-10-09T11:32:01.604+03:00 | [inputs.statsd.tags]
| 2023-10-09T11:32:01.604+03:00 | "aws:AggregationInterval" = "60s"
| 2023-10-09T11:32:01.604+03:00 | [outputs]
| 2023-10-09T11:32:01.604+03:00 | [[outputs.cloudwatch]]
| 2023-10-09T11:32:01.604+03:00 | 2023/10/09 08:32:01 I! Config has been translated into YAML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.yaml
| 2023-10-09T11:32:01.604+03:00 | 2023/10/09 08:32:01 D! config connectors: {}
{
"agent": {
"debug": true
},
"metrics":{
"namespace": "test",
"metrics_collected":{
"statsd":{
"service_address":":8125",
"metrics_collection_interval":10,
"metrics_aggregation_interval":60
}
}
}
}
It passes the part where it is reaching out to imds as expected. This means it tried to get information, was unable to, then tried to get info from ecs metadata and could get info so it continued. The issue seems to happen after agent has started (no imds calls happen at this point). The issue could be with the statsd receiver or exporter.
This issue was marked stale due to lack of activity.
Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.