amazon-cloudwatch-agent icon indicating copy to clipboard operation
amazon-cloudwatch-agent copied to clipboard

ec2tagger: Unable to retrieve InstanceId.

Open rdonadono opened this issue 4 years ago • 3 comments

Hi team,

I'm trying to migrate the EKS metrics and logs from Prometheus to Cloudwatch using this agent but I have some problem.

I have followed this doc and first of all I have attached the policy CloudWatchAgentServerPolicy on my NodeGroups IAM role.

Then I executed this command as indicated from the doc.

ClusterName="<...>"
RegionName="<...>"
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/;s/{{region_name}}/'${RegionName}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl apply -f - 

Next this step I verified on my EKS cluster if the deamonset works properly but I see them restarting in loop.

Logs of cloudwatch-agent pods show the same error:

2022/02/19 15:26:45 I! 2022/02/19 15:26:42 E! ec2metadata is not available
2022/02/19 15:26:42 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2022/02/19 15:26:43 W! retry [0/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2022/02/19 15:26:44 W! retry [1/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2022/02/19 15:26:45 W! retry [2/3], unable to get http response from http://169.254.170.2/v2/metadata, error: unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2022/02/19 15:26:45 I! access ECS task metadata fail with response unable to get response from http://169.254.170.2/v2/metadata, error: Get "http://169.254.170.2/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
I! Detected the instance is OnPrem
2022/02/19 15:26:45 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2022/02/19 15:26:45 Reading json config file path: /etc/cwagentconfig/..2022_02_19_15_26_37.192607950/cwagentconfig.json ...
2022/02/19 15:26:45 Find symbolic link /etc/cwagentconfig/..data 
2022/02/19 15:26:45 Find symbolic link /etc/cwagentconfig/cwagentconfig.json 
2022/02/19 15:26:45 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
Valid Json input schema.
Got Home directory: /root
No csm configuration found.
No metric configuration found.
Configuration validation first phase succeeded
 
2022/02/19 15:26:45 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml 
2022-02-19T15:26:45Z I! Starting AmazonCloudWatchAgent 1.247348.0
2022-02-19T15:26:45Z I! Loaded inputs: cadvisor k8sapiserver
2022-02-19T15:26:45Z I! Loaded aggregators: 
2022-02-19T15:26:45Z I! Loaded processors: ec2tagger k8sdecorator
2022-02-19T15:26:45Z I! Loaded outputs: cloudwatchlogs
2022-02-19T15:26:45Z I! Tags enabled: 
2022-02-19T15:26:45Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-192-168-78-202.eu-central-1.compute.internal", Flush Interval:1s
2022-02-19T15:26:45Z I! [logagent] starting
2022-02-19T15:26:45Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-02-19T15:30:46Z E! [processors.ec2tagger] ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
2022-02-19T15:30:46Z E! [telegraf] Error running agent: could not initialize processor ec2tagger: ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance

I have already verified if the problem is a network problem, but with this command from inside each node I can get the EC2 instance metadata correctly:

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` && curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id

This error shows up on fluent-bit pods too.

AWS for Fluent Bit Container Image Version 2.10.0
[1mFluent Bit v1.6.8[0m
* [1m[93mCopyright (C) 2019-2020 The Fluent Bit Authors[0m
* [1m[93mCopyright (C) 2015-2018 Treasure Data[0m
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/02/19 15:26:40] [ info] [engine] started (pid=1)
[2022/02/19 15:26:40] [ info] [storage] version=1.0.6, initializing...
[2022/02/19 15:26:40] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/02/19 15:26:40] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/02/19 15:26:40] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/02/19 15:26:40] [ info] [input:systemd:systemd.3] seek_cursor=s=82a20e741bc74377ba38eb0d776ad4dd;i=cb7... OK
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284109.541128008.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284111.242931767.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284111.243140450.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284111.922505134.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284112.23870133.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284112.284614847.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.263762778.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.263971979.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.284565147.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.646807894.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.647037408.flb
[2022/02/19 15:26:40] [ info] [input:storage_backlog:storage_backlog.8] register tail.0/1-1645284116.647200365.flb
[2022/02/19 15:26:40] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/02/19 15:26:40] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/02/19 15:26:40] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/02/19 15:26:45] [ info] [filter:kubernetes:kubernetes.0] API server connectivity OK
[2022/02/19 15:26:45] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/02/19 15:26:45] [ info] [sp] stream processor started
[2022/02/19 15:26:45] [ info] [input:tail:tail.0] inotify_fs_add(): inode=108010284 watch_fd=1 name=/var/log/containers/aws-load-balancer-controller-859586cf74-rt9ls_kube-system_aws-load-balancer-controller-562ab1af9253b5ca83a3c8acef612683698b7f7ce6ac89da42c1d1277c181f00.log
[...]
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [ info] [input:tail:tail.4] inotify_fs_add(): inode=52448822 watch_fd=1 name=/var/log/containers/aws-node-mzs4r_kube-system_aws-node-b2ae85e13ca72e02a42ffd3d1832a691a037355de0770945feec31894f27ef3a.log
[...]
[2022/02/19 15:26:46] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284109.541128008.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284111.242931767.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284111.243140450.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284111.922505134.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284112.23870133.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284112.284614847.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.263762778.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.263971979.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.284565147.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.646807894.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.647037408.flb
[2022/02/19 15:26:46] [ info] [input:storage_backlog:storage_backlog.8] queueing tail.0:1-1645284116.647200365.flb
[2022/02/19 15:26:47] [error] [filter:aws:aws.2] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:47] [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS
[2022/02/19 15:26:50] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log group /aws/containerinsights/<...>/application
[2022/02/19 15:26:50] [error] [aws_credentials] Could not read shared credentials file /root/.aws/credentials
[2022/02/19 15:26:50] [error] [aws_credentials] Failed to retrieve credentials for AWS Profile default
[2022/02/19 15:26:50] [ warn] [aws_credentials] No cached credentials are available and a credential refresh is already in progress. The current co-routine will retry.
[2022/02/19 15:26:50] [error] [signv4] Provider returned no credentials, service=logs
[2022/02/19 15:26:50] [error] [aws_client] could not sign request

I have a EKS v1.20 cluster created by eksctl and I have 2 NodeGroup, one of OnDemand and one of Spot with same configuration.

What can I do to understand the problem?

Thanks!

rdonadono avatar Feb 19 '22 17:02 rdonadono

I assume I have found the source of the problem in this issue.

rdonadono avatar Feb 19 '22 20:02 rdonadono

This issue was marked stale due to lack of activity.

github-actions[bot] avatar May 21 '22 00:05 github-actions[bot]

This issue was marked stale due to lack of activity.

github-actions[bot] avatar Aug 25 '22 00:08 github-actions[bot]

I have the exact issue, after recently new Launch Templates started to disable v1, and only enabled v2. Is there an option we can set to disable, or bypass the reliance on IMDS?

dtna7 avatar Nov 01 '22 15:11 dtna7

I experience the same, any progress with this one?

Tomer20 avatar Dec 04 '22 06:12 Tomer20

I got the same issue with the launch template only enabling V2

wolviecb avatar Dec 05 '22 12:12 wolviecb

This issue was marked stale due to lack of activity.

github-actions[bot] avatar Mar 06 '23 00:03 github-actions[bot]

Closing this because it has stalled. Feel free to reopen if this issue is still relevant, or to ping the collaborator who labeled it stalled if you have any questions.

github-actions[bot] avatar Apr 17 '23 00:04 github-actions[bot]