amazon-cloudwatch-agent icon indicating copy to clipboard operation
amazon-cloudwatch-agent copied to clipboard

cloudwatch-agent/cloudwatch-agent:1.300028.1b210 breaks Fargate integration

Open jeffrigby opened this issue 2 years ago • 4 comments

The latest image (cloudwatch-agent/cloudwatch-agent:1.300028.1b210) broke my previously working Fargate instance. I include it in my SAM template task definition.

        - Name: cloudwatch-agent
          Image: "public.ecr.aws/cloudwatch-agent/cloudwatch-agent:latest"
          Secrets:
            - Name: CW_CONFIG_CONTENT
              ValueFrom: !Ref CWAgentConfigSecret
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref StatsLogGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: ecs
          Essential: true
          Environment:
            - Name: AWS_EMF_ENVIRONMENT
              Value: ECS
            - Name: AWS_EMF_SERVICE_TYPE
              Value: ECS
            - Name: AWS_EMF_LOG_GROUP_NAME
              Value: !Ref StatsLogGroup
            - Name: AWS_EMF_NAMESPACE
              Value: "App/Name"

My secret is just the default:

{"agent":{"log_level":"INFO"},"logs":{"metrics_collected":{"emf":{}}}}

It worked fine with the previous version (1.300026.3b189), but the new version can't send the data. Reverting and fixing the previous version (not latest) resolved the issues.

I get these logs:

September 27, 2023 at 17:09 (UTC-4:00)	2023/09/27 21:09:22 D! should retry true for imds error : RequestError: send request failed	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	2023/09/27 21:09:22 D! could not get instance document without imds v1 fallback enable thus enable fallback	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	E! [EC2] Fetch hostname from EC2 metadata fail: RequestError: send request failed	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	caused by: Get "http://169.254.169.254/latest/meta-data/hostname": dial tcp 169.254.169.254:80: connect: invalid argument	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	2023/09/27 21:09:22 D! should retry true for imds error : RequestError: send request failed	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	2023/09/27 21:09:21 D! should retry true for imds error : RequestError: send request failed	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	2023/09/27 21:09:21 D! could not get hostname without imds v1 fallback enable thus enable fallback	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	D! [EC2] Found active network interface	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	2023/09/27 21:09:21 I! imds retry client will retry 1 times	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	2023/09/27 21:09:21 D! should retry true for imds error : RequestError: send request failed	cloudwatch-agent
September 27, 2023 at 17:09 (UTC-4:00)	caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument	cloudwatch-agent

How can I get it to fetch a token?

jeffrigby avatar Sep 27 '23 22:09 jeffrigby

Thank you for bringing this issue to our attention. We are trying to reproduce.

I'm assuming what you posted is not the full agent log. Is that correct?

In fargate you will not be able to get the imds token. It makes sense to see these error logs. We added extra error logging in this version for getting the imds token.

Can you please use this config

{
    "agent": {
        "debug": true
    },
    "logs": {
        "metrics_collected": {
            "emf": {}
        }
    }
}

and post the full agent log.

sethAmazon avatar Oct 04 '23 20:10 sethAmazon

2023-10-09T11:31:59.801+03:00 | D! [EC2] Found active network interface
-- | --
  | 2023-10-09T11:31:59.806+03:00 | 2023/10/09 08:31:59 I! imds retry client will retry 1 times
  | 2023-10-09T11:31:59.807+03:00 | 2023/10/09 08:31:59 D! should retry true for imds error : RequestError: send request failed
  | 2023-10-09T11:31:59.807+03:00 | caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument
  | 2023-10-09T11:31:59.895+03:00 | 2023/10/09 08:31:59 D! should retry true for imds error : RequestError: send request failed
  | 2023-10-09T11:31:59.895+03:00 | caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument
  | 2023-10-09T11:31:59.895+03:00 | 2023/10/09 08:31:59 D! could not get hostname without imds v1 fallback enable thus enable fallback
  | 2023-10-09T11:32:00.598+03:00 | E! [EC2] Fetch hostname from EC2 metadata fail: RequestError: send request failed
  | 2023-10-09T11:32:00.598+03:00 | caused by: Get "http://169.254.169.254/latest/meta-data/hostname": dial tcp 169.254.169.254:80: connect: invalid argument
  | 2023-10-09T11:32:00.599+03:00 | 2023/10/09 08:32:00 D! should retry true for imds error : RequestError: send request failed
  | 2023-10-09T11:32:00.599+03:00 | caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument
  | 2023-10-09T11:32:00.630+03:00 | 2023/10/09 08:32:00 D! should retry true for imds error : RequestError: send request failed
  | 2023-10-09T11:32:00.630+03:00 | caused by: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: invalid argument
  | 2023-10-09T11:32:00.630+03:00 | 2023/10/09 08:32:00 D! could not get instance document without imds v1 fallback enable thus enable fallback
  | 2023-10-09T11:32:01.406+03:00 | E! [EC2] Fetch identity document from EC2 metadata fail: EC2MetadataRequestError: failed to get EC2 instance identity document
  | 2023-10-09T11:32:01.406+03:00 | caused by: RequestError: send request failed
  | 2023-10-09T11:32:01.406+03:00 | caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument
  | 2023-10-09T11:32:01.407+03:00 | 2023/10/09 08:32:01 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
  | 2023-10-09T11:32:01.427+03:00 | I! Detected the instance is ECS
  | 2023-10-09T11:32:01.429+03:00 | 2023/10/09 08:32:01 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
  | 2023-10-09T11:32:01.429+03:00 | /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
  | 2023-10-09T11:32:01.430+03:00 | Cannot access /etc/cwagentconfig: lstat /etc/cwagentconfig: no such file or directory
  | 2023-10-09T11:32:01.430+03:00 | 2023/10/09 08:32:01 unable to scan config dir /etc/cwagentconfig with error: lstat /etc/cwagentconfig: no such file or directory
  | 2023-10-09T11:32:01.431+03:00 | 2023/10/09 08:32:01 Reading json config from from environment variable CW_CONFIG_CONTENT.
  | 2023-10-09T11:32:01.598+03:00 | 2023/10/09 08:32:01 I! Valid Json input schema.
  | 2023-10-09T11:32:01.598+03:00 | I! Trying to detect region from ec2
  | 2023-10-09T11:32:01.599+03:00 | I! Trying to detect region from ecs
  | 2023-10-09T11:32:01.600+03:00 | 2023/10/09 08:32:01 D! pipeline hostDeltaMetrics has no receivers
  | 2023-10-09T11:32:01.601+03:00 | 2023/10/09 08:32:01 Configuration validation first phase succeeded
  | 2023-10-09T11:32:01.604+03:00 | 2023/10/09 08:32:01 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
  | 2023-10-09T11:32:01.604+03:00 | 2023/10/09 08:32:01 D! config [agent]
  | 2023-10-09T11:32:01.604+03:00 | collection_jitter = "0s"
  | 2023-10-09T11:32:01.604+03:00 | debug = true
  | 2023-10-09T11:32:01.604+03:00 | flush_interval = "1s"
  | 2023-10-09T11:32:01.604+03:00 | flush_jitter = "0s"
  | 2023-10-09T11:32:01.604+03:00 | hostname = ""
  | 2023-10-09T11:32:01.604+03:00 | interval = "60s"
  | 2023-10-09T11:32:01.604+03:00 | logfile = ""
  | 2023-10-09T11:32:01.604+03:00 | logtarget = "lumberjack"
  | 2023-10-09T11:32:01.604+03:00 | metric_batch_size = 1000
  | 2023-10-09T11:32:01.604+03:00 | metric_buffer_limit = 10000
  | 2023-10-09T11:32:01.604+03:00 | omit_hostname = true
  | 2023-10-09T11:32:01.604+03:00 | precision = ""
  | 2023-10-09T11:32:01.604+03:00 | quiet = false
  | 2023-10-09T11:32:01.604+03:00 | round_interval = false
  | 2023-10-09T11:32:01.604+03:00 | [inputs]
  | 2023-10-09T11:32:01.604+03:00 | [[inputs.statsd]]
  | 2023-10-09T11:32:01.604+03:00 | interval = "10s"
  | 2023-10-09T11:32:01.604+03:00 | parse_data_dog_tags = true
  | 2023-10-09T11:32:01.604+03:00 | service_address = ":8125"
  | 2023-10-09T11:32:01.604+03:00 | [inputs.statsd.tags]
  | 2023-10-09T11:32:01.604+03:00 | "aws:AggregationInterval" = "60s"
  | 2023-10-09T11:32:01.604+03:00 | [outputs]
  | 2023-10-09T11:32:01.604+03:00 | [[outputs.cloudwatch]]
  | 2023-10-09T11:32:01.604+03:00 | 2023/10/09 08:32:01 I! Config has been translated into YAML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.yaml
  | 2023-10-09T11:32:01.604+03:00 | 2023/10/09 08:32:01 D! config connectors: {}

{
    "agent": {
        "debug": true
    },
    "metrics":{
        "namespace": "test",
        "metrics_collected":{
            "statsd":{
                "service_address":":8125",
                "metrics_collection_interval":10,
                "metrics_aggregation_interval":60
            }
        }
    }
}

mihaileu avatar Oct 09 '23 08:10 mihaileu

It passes the part where it is reaching out to imds as expected. This means it tried to get information, was unable to, then tried to get info from ecs metadata and could get info so it continued. The issue seems to happen after agent has started (no imds calls happen at this point). The issue could be with the statsd receiver or exporter.

sethAmazon avatar Oct 09 '23 15:10 sethAmazon

This issue was marked stale due to lack of activity.

github-actions[bot] avatar Feb 21 '24 00:02 github-actions[bot]

Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.

github-actions[bot] avatar Apr 05 '24 00:04 github-actions[bot]