common-config.toml seems not working when cloudwatch agent run in a container
Describe the bug
It seems like common-config.toml not being applied when running cloudwatch agent in a container with RUN_IN_CONTAINER being set to true.
I don't know if this is a bug or I'm not using it correctly?
Also I'm not sure if it's related, but I noticed this piece of code:
when runInContainer == True, the --config parameter is not passed to the arguments.
https://github.com/aws/amazon-cloudwatch-agent/blob/c0c8a921457365ea82107f16e64e2619991499fc/cmd/start-amazon-cloudwatch-agent/start-amazon-cloudwatch-agent.go#L47-L51
Steps to reproduce I used this Dockerfile to create an image, started the image and verified that the proxy set in common-config.toml did not take effect.
FROM ubuntu:latest as build
# NOTE: This arg will be populated by docker buildx
# https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope
ARG TARGETARCH
RUN apt-get update && \
apt-get install -y ca-certificates curl && \
rm -rf /var/lib/apt/lists/*
RUN curl -O https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/${TARGETARCH:-$(dpkg --print-architecture)}/latest/amazon-cloudwatch-agent.deb && \
dpkg -i -E amazon-cloudwatch-agent.deb && \
rm -rf /tmp/* && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl && \
rm -rf /opt/aws/amazon-cloudwatch-agent/bin/config-downloader
FROM openjdk:8-jdk
COPY --from=build /tmp /tmp
COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=build /opt/aws/amazon-cloudwatch-agent /opt/aws/amazon-cloudwatch-agent
COPY conf/common-config.toml /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml
COPY conf/config.json /etc/cwagentconfig/config.json
COPY conf/config conf/credentials /root/.aws/
ENV RUN_IN_CONTAINER="True"
ENTRYPOINT ["/opt/aws/amazon-cloudwatch-agent/bin/start-amazon-cloudwatch-agent"]
What did you expect to see? Proxy configuration settings defined in common-config.toml work.
What did you see instead?
- "PROXY is set to xxx" did not appear in the log.
- The log shows that the sending of metrics failed due to timeout (indicating that agent was not using the proxy server I set).
What version did you use?
Version: v1.247359.1
What config did you use? Config:
config.json
{
"agent": {
"run_as_user": "root"
},
"metrics": {
"metrics_collected": {
"cpu": {
"measurement": [
"cpu_usage_idle"
],
"metrics_collection_interval": 300,
"totalcpu": true
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 300
}
}
}
}
common-config.toml
[proxy]
http_proxy = "http://172.19.0.3:3128"
https_proxy = "http://172.19.0.3:3128"
Environment OS: "Debian GNU/Linux 11 (bullseye)"
Additional context
I have configured the Docker container that only through a specified proxy server can it access the external network. If CloudWatch Agent does not use the proxy, a timeout error will occur just like the last couple lines shown in the log.
$ docker run --name cwagent --network="my-network" -itd cloudwatch-agent:1.0
4a2ca3f5260283567b8b09ed7e1c145ba82f95256e11f32bfa3a171224cec309
$ docker logs -f cwagent
2023/05/26 06:03:08 I! D! [EC2] Found active network interface
E! [EC2] Cannot get EC2 Metadata from IMDS: EC2 metadata is not available.
2023/05/26 06:03:05 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2023/05/26 06:03:06 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:03:07 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:03:08 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:03:08 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPremise
2023/05/26 06:03:08 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2023/05/26 06:03:08 Reading json config file path: /etc/cwagentconfig/config.json ...
2023/05/26 06:03:08 I! Valid Json input schema.
Got Home directory: /root
Got Home directory: /root
I! Set home dir Linux: /root
I! SDKRegionWithCredsMap region: ap-northeast-1
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
2023/05/26 06:03:08 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2023/05/26 06:03:08 D! toml config [agent]
collection_jitter = "0s"
debug = false
flush_interval = "1s"
flush_jitter = "0s"
hostname = ""
interval = "60s"
logfile = ""
logtarget = "lumberjack"
metric_batch_size = 1000
metric_buffer_limit = 10000
omit_hostname = true
precision = ""
quiet = false
round_interval = false
[inputs]
[[inputs.cpu]]
fieldpass = ["usage_idle"]
interval = "300s"
percpu = false
totalcpu = true
[inputs.cpu.tags]
metricPath = "metrics"
[[inputs.mem]]
fieldpass = ["used_percent"]
interval = "300s"
[inputs.mem.tags]
metricPath = "metrics"
[outputs]
[[outputs.cloudwatch]]
force_flush_interval = "60s"
namespace = "CWAgent"
profile = "AmazonCloudWatchAgent"
region = "ap-northeast-1"
shared_credential_file = "/root/.aws/credentials"
tagexclude = ["metricPath"]
[outputs.cloudwatch.tagpass]
metricPath = ["metrics"]
2023-05-26T06:03:08Z I! Starting AmazonCloudWatchAgent 1.247359.1
2023-05-26T06:03:08Z I! AWS SDK log level not set
2023-05-26T06:03:08Z I! Loaded inputs: cpu mem
2023-05-26T06:03:08Z I! Loaded aggregators:
2023-05-26T06:03:08Z I! Loaded processors:
2023-05-26T06:03:08Z I! Loaded outputs: cloudwatch
2023-05-26T06:03:08Z I! Tags enabled:
2023-05-26T06:03:08Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1s
2023-05-26T06:03:08Z I! [logagent] starting
2023-05-26T06:03:08Z I! will use file based credentials provider
2023-05-26T06:03:08Z I! cloudwatch: get unique roll up list []
2023-05-26T06:03:08Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 12.218132673s
2023-05-26T06:04:52Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.ap-northeast-1.amazonaws.com/": dial tcp: lookup monitoring.ap-northeast-1.amazonaws.com on 127.0.0.11:53: read udp 127.0.0.1:48837->127.0.0.11:53: i/o timeout
2023-05-26T06:04:52Z W! cloudwatch: 0 retries, going to sleep 122 ms before retrying.
2023-05-26T06:04:52Z E! cloudwatch: WriteToCloudWatch failure, err: RequestError: send request failed
caused by: Post "https://monitoring.ap-northeast-1.amazonaws.com/": dial tcp: lookup monitoring.ap-northeast-1.amazonaws.com on 127.0.0.11:53: read udp 127.0.0.1:48837->127.0.0.11:53: i/o timeout
But if I set environment variables of http_proxy and https_proxy, everything works fine.
- "PROXY is set to xxx" appeared in the log.
- There are no errors in log.
- I confirmed through the AWS console - CloudWatch - Metrics - CWAgent that the metrics are being collected normally.
Normal log
$ docker run --name cwagent --network="my-network" -e "http_proxy=http://172.19.0.3:3128" -e "https_proxy=http://172.19.0.3:3128" -itd cloudwatch-agent:1.0
aba2a77502ec2ffbb1a90679a683b0625e4b0cfd8a4bd7cd73751c0f7ae57153
$ docker logs -f cwagent
2023/05/26 06:08:45 I! D! [EC2] Found active network interface
E! [EC2] Cannot get EC2 Metadata from IMDS: EC2 metadata is not available.
2023/05/26 06:08:42 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2023/05/26 06:08:43 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:08:44 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:08:45 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023/05/26 06:08:45 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPremise
2023/05/26 06:08:45 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2023/05/26 06:08:45 Reading json config file path: /etc/cwagentconfig/config.json ...
2023/05/26 06:08:45 I! Valid Json input schema.
Got Home directory: /root
Got Home directory: /root
I! Set home dir Linux: /root
I! SDKRegionWithCredsMap region: ap-northeast-1
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
2023/05/26 06:08:45 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2023/05/26 06:08:45 D! toml config [agent]
collection_jitter = "0s"
debug = false
flush_interval = "1s"
flush_jitter = "0s"
hostname = ""
interval = "60s"
logfile = ""
logtarget = "lumberjack"
metric_batch_size = 1000
metric_buffer_limit = 10000
omit_hostname = true
precision = ""
quiet = false
round_interval = false
[inputs]
[[inputs.cpu]]
fieldpass = ["usage_idle"]
interval = "300s"
percpu = false
totalcpu = true
[inputs.cpu.tags]
metricPath = "metrics"
[[inputs.mem]]
fieldpass = ["used_percent"]
interval = "300s"
[inputs.mem.tags]
metricPath = "metrics"
[outputs]
[[outputs.cloudwatch]]
force_flush_interval = "60s"
namespace = "CWAgent"
profile = "AmazonCloudWatchAgent"
region = "ap-northeast-1"
shared_credential_file = "/root/.aws/credentials"
tagexclude = ["metricPath"]
[outputs.cloudwatch.tagpass]
metricPath = ["metrics"]
2023-05-26T06:08:45Z I! HTTPS_PROXY is set to "http://172.19.0.3:3128"
2023-05-26T06:08:45Z I! HTTP_PROXY is set to "http://172.19.0.3:3128"
2023-05-26T06:08:45Z I! Starting AmazonCloudWatchAgent 1.247359.1
2023-05-26T06:08:45Z I! AWS SDK log level not set
2023-05-26T06:08:45Z I! Loaded inputs: cpu mem
2023-05-26T06:08:45Z I! Loaded aggregators:
2023-05-26T06:08:45Z I! Loaded processors:
2023-05-26T06:08:45Z I! Loaded outputs: cloudwatch
2023-05-26T06:08:45Z I! Tags enabled:
2023-05-26T06:08:45Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:1s
2023-05-26T06:08:45Z I! will use file based credentials provider
2023-05-26T06:08:45Z I! [logagent] starting
2023-05-26T06:08:45Z I! cloudwatch: get unique roll up list []
2023-05-26T06:08:45Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 14.524260635s
2023-05-26T06:09:15Z I! HTTPS_PROXY is set to "http://172.19.0.3:3128"
2023-05-26T06:09:15Z I! HTTP_PROXY is set to "http://172.19.0.3:3128"
This issue was marked stale due to lack of activity.
Hi, any update here? Let me know if there's any information that I should add.
Thank you for bringing this issue to our attention.
The common-config.toml is for ec2. For containers, you need to use the environment variables.
So that's how it is, I understand now, thank you. It would be great if there was documentation for this, or maybe I just haven't found it.
This issue was marked stale due to lack of activity.
Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.