amazon-cloudwatch-agent icon indicating copy to clipboard operation
amazon-cloudwatch-agent copied to clipboard

common-config.toml seems not working when cloudwatch agent run in a container

Open jasond1016 opened this issue 2 years ago • 4 comments

Describe the bug It seems like common-config.toml not being applied when running cloudwatch agent in a container with RUN_IN_CONTAINER being set to true. I don't know if this is a bug or I'm not using it correctly?

Also I'm not sure if it's related, but I noticed this piece of code: when runInContainer == True, the --config parameter is not passed to the arguments. https://github.com/aws/amazon-cloudwatch-agent/blob/c0c8a921457365ea82107f16e64e2619991499fc/cmd/start-amazon-cloudwatch-agent/start-amazon-cloudwatch-agent.go#L47-L51

Steps to reproduce I used this Dockerfile to create an image, started the image and verified that the proxy set in common-config.toml did not take effect.

FROM ubuntu:latest as build

# NOTE: This arg will be populated by docker buildx
# https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope
ARG TARGETARCH

RUN apt-get update &&  \
    apt-get install -y ca-certificates curl && \
    rm -rf /var/lib/apt/lists/*

RUN curl -O https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/${TARGETARCH:-$(dpkg --print-architecture)}/latest/amazon-cloudwatch-agent.deb && \
    dpkg -i -E amazon-cloudwatch-agent.deb && \
    rm -rf /tmp/* && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl && \
    rm -rf /opt/aws/amazon-cloudwatch-agent/bin/config-downloader

FROM openjdk:8-jdk

COPY --from=build /tmp /tmp

COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

COPY --from=build /opt/aws/amazon-cloudwatch-agent /opt/aws/amazon-cloudwatch-agent

COPY conf/common-config.toml /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml
COPY conf/config.json /etc/cwagentconfig/config.json
COPY conf/config conf/credentials /root/.aws/

ENV RUN_IN_CONTAINER="True"
ENTRYPOINT ["/opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent"]

What did you expect to see? Proxy configuration settings defined in common-config.toml work.

What did you see instead?

  • "PROXY is set to xxx" did not appear in the log.
  • The log shows that the sending of metrics failed due to timeout (indicating that agent was not using the proxy server I set).

What version did you use? Version: v1.247359.1

What config did you use? Config:

config.json

{
        "agent": {
                "run_as_user": "root"
        },
        "metrics": {
                "metrics_collected": {
                        "cpu": {
                                "measurement": [
                                        "cpu_usage_idle"
                                ],
                                "metrics_collection_interval": 300,
                                "totalcpu": true
                        },
                        "mem": {
                                "measurement": [
                                        "mem_used_percent"
                                ],
                                "metrics_collection_interval": 300
                        }
                }
        }
}

common-config.toml

[proxy]
http_proxy = "http://172.19.0.3:3128"
https_proxy = "http://172.19.0.3:3128"

Environment OS: "Debian GNU/Linux 11 (bullseye)"

Additional context

I have configured the Docker container that only through a specified proxy server can it access the external network. If CloudWatch Agent does not use the proxy, a timeout error will occur just like the last couple lines shown in the log.

$ docker run --name cwagent --network="my-network" -itd cloudwatch-agent:1.0
4a2ca3f5260283567b8b09ed7e1c145ba82f95256e11f32bfa3a171224cec309
$ docker logs -f cwagent 
2023/05/26 06:03:08 I! D! [EC2] Found active network interface
E! [EC2] Cannot get EC2 Metadata from IMDS: EC2 metadata is not available.
2023/05/26 06:03:05 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2023/05/26 06:03:06 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:03:07 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:03:08 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:03:08 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPremise
2023/05/26 06:03:08 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2023/05/26 06:03:08 Reading json config file path: /etc/cwagentconfig/config.json ...
2023/05/26 06:03:08 I! Valid Json input schema.
Got Home directory: /root
Got Home directory: /root
I! Set home dir Linux: /root
I! SDKRegionWithCredsMap region:  ap-northeast-1
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded

2023/05/26 06:03:08 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml 
2023/05/26 06:03:08 D! toml config [agent]
  collection_jitter = "0s"
  debug = false
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  interval = "60s"
  logfile = ""
  logtarget = "lumberjack"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = true
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.cpu]]
    fieldpass = ["usage_idle"]
    interval = "300s"
    percpu = false
    totalcpu = true
    [inputs.cpu.tags]
      metricPath = "metrics"

  [[inputs.mem]]
    fieldpass = ["used_percent"]
    interval = "300s"
    [inputs.mem.tags]
      metricPath = "metrics"

[outputs]

  [[outputs.cloudwatch]]
    force_flush_interval = "60s"
    namespace = "CWAgent"
    profile = "AmazonCloudWatchAgent"
    region = "ap-northeast-1"
    shared_credential_file = "/root/.aws/credentials"
    tagexclude = ["metricPath"]
    [outputs.cloudwatch.tagpass]
      metricPath = ["metrics"]
2023-05-26T06:03:08Z I! Starting AmazonCloudWatchAgent 1.247359.1
2023-05-26T06:03:08Z I! AWS SDK log level not set
2023-05-26T06:03:08Z I! Loaded inputs: cpu mem
2023-05-26T06:03:08Z I! Loaded aggregators:
2023-05-26T06:03:08Z I! Loaded processors:
2023-05-26T06:03:08Z I! Loaded outputs: cloudwatch
2023-05-26T06:03:08Z I! Tags enabled:
2023-05-26T06:03:08Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1s
2023-05-26T06:03:08Z I! [logagent] starting
2023-05-26T06:03:08Z I! will use file based credentials provider 
2023-05-26T06:03:08Z I! cloudwatch: get unique roll up list []
2023-05-26T06:03:08Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 12.218132673s
2023-05-26T06:04:52Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.ap-northeast-1.amazonaws.com/": dial tcp: lookup monitoring.ap-northeast-1.amazonaws.com on 127.0.0.11:53: read udp 127.0.0.1:48837->127.0.0.11:53: i/o timeout
2023-05-26T06:04:52Z W! cloudwatch: 0 retries, going to sleep 122 ms before retrying.
2023-05-26T06:04:52Z E! cloudwatch: WriteToCloudWatch failure, err:  RequestError: send request failed
caused by: Post "https://monitoring.ap-northeast-1.amazonaws.com/": dial tcp: lookup monitoring.ap-northeast-1.amazonaws.com on 127.0.0.11:53: read udp 127.0.0.1:48837->127.0.0.11:53: i/o timeout

But if I set environment variables of http_proxy and https_proxy, everything works fine.

  • "PROXY is set to xxx" appeared in the log.
  • There are no errors in log.
  • I confirmed through the AWS console - CloudWatch - Metrics - CWAgent that the metrics are being collected normally.

Normal log

$ docker run --name cwagent --network="my-network" -e "http_proxy=http://172.19.0.3:3128" -e "https_proxy=http://172.19.0.3:3128" -itd cloudwatch-agent:1.0
aba2a77502ec2ffbb1a90679a683b0625e4b0cfd8a4bd7cd73751c0f7ae57153
$ docker logs -f cwagent 
2023/05/26 06:08:45 I! D! [EC2] Found active network interface
E! [EC2] Cannot get EC2 Metadata from IMDS: EC2 metadata is not available.
2023/05/26 06:08:42 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2023/05/26 06:08:43 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:08:44 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:08:45 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:08:45 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPremise
2023/05/26 06:08:45 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2023/05/26 06:08:45 Reading json config file path: /etc/cwagentconfig/config.json ...
2023/05/26 06:08:45 I! Valid Json input schema.
Got Home directory: /root
Got Home directory: /root
I! Set home dir Linux: /root
I! SDKRegionWithCredsMap region:  ap-northeast-1
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded

2023/05/26 06:08:45 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2023/05/26 06:08:45 D! toml config [agent]
  collection_jitter = "0s"
  debug = false
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  interval = "60s"
  logfile = ""
  logtarget = "lumberjack"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = true
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.cpu]]
    fieldpass = ["usage_idle"]
    interval = "300s"
    percpu = false
    totalcpu = true
    [inputs.cpu.tags]
      metricPath = "metrics"

  [[inputs.mem]]
    fieldpass = ["used_percent"]
    interval = "300s"
    [inputs.mem.tags]
      metricPath = "metrics"

[outputs]

  [[outputs.cloudwatch]]
    force_flush_interval = "60s"
    namespace = "CWAgent"
    profile = "AmazonCloudWatchAgent"
    region = "ap-northeast-1"
    shared_credential_file = "/root/.aws/credentials"
    tagexclude = ["metricPath"]
    [outputs.cloudwatch.tagpass]
      metricPath = ["metrics"]
2023-05-26T06:08:45Z I! HTTPS_PROXY is set to "http://172.19.0.3:3128"
2023-05-26T06:08:45Z I! HTTP_PROXY is set to "http://172.19.0.3:3128"
2023-05-26T06:08:45Z I! Starting AmazonCloudWatchAgent 1.247359.1
2023-05-26T06:08:45Z I! AWS SDK log level not set
2023-05-26T06:08:45Z I! Loaded inputs: cpu mem
2023-05-26T06:08:45Z I! Loaded aggregators:
2023-05-26T06:08:45Z I! Loaded processors:
2023-05-26T06:08:45Z I! Loaded outputs: cloudwatch
2023-05-26T06:08:45Z I! Tags enabled:
2023-05-26T06:08:45Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1s
2023-05-26T06:08:45Z I! will use file based credentials provider
2023-05-26T06:08:45Z I! [logagent] starting
2023-05-26T06:08:45Z I! cloudwatch: get unique roll up list []
2023-05-26T06:08:45Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 14.524260635s
2023-05-26T06:09:15Z I! HTTPS_PROXY is set to "http://172.19.0.3:3128"
2023-05-26T06:09:15Z I! HTTP_PROXY is set to "http://172.19.0.3:3128"

jasond1016 avatar May 26 '23 07:05 jasond1016

This issue was marked stale due to lack of activity.

github-actions[bot] avatar Aug 25 '23 00:08 github-actions[bot]

Hi, any update here? Let me know if there's any information that I should add.

jasond1016 avatar Aug 29 '23 09:08 jasond1016

Thank you for bringing this issue to our attention.

The common-config.toml is for ec2. For containers, you need to use the environment variables.

sethAmazon avatar Oct 05 '23 17:10 sethAmazon

So that's how it is, I understand now, thank you. It would be great if there was documentation for this, or maybe I just haven't found it.

jasond1016 avatar Oct 07 '23 09:10 jasond1016

This issue was marked stale due to lack of activity.

github-actions[bot] avatar Aug 18 '24 00:08 github-actions[bot]

Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.

github-actions[bot] avatar Sep 22 '24 00:09 github-actions[bot]