datadog-agent icon indicating copy to clipboard operation
datadog-agent copied to clipboard

EKS Fargate agent:7-jmx constantly log warning

Open jotasixto opened this issue 4 years ago • 3 comments

Output of the info page (if this is a bug)

❯ agent status
2022-02-17 10:07:38 UTC | CORE | WARN | (pkg/util/log/log.go:640 in func1) | Deactivating Autoconfig will disable most components. It's recommended to use autoconfig_exclude_features and autoconfig_include_features to activate/deactivate features selectively
Getting the status from the agent.

===============
Agent (v7.33.1)
===============

  Status date: 2022-02-17 10:07:38.692 UTC (1645092458692)
  Agent start: 2022-02-17 06:32:02.888 UTC (1645079522888)
  Pid: 380
  Go Version: go1.16.7
  Python Version: 3.8.11
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: warn

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -1.931ms
    System time: 2022-02-17 10:07:38.692 UTC (1645092458692)

  Host Info
  =========
    bootTime: 2022-02-17 06:27:55 UTC (1645079275000)
    kernelArch: x86_64
    kernelVersion: 4.14.262-200.489.amzn2.x86_64
    os: linux
    platform: ubuntu
    platformFamily: debian
    platformVersion: 21.10
    procs: 11
    uptime: 4m32s

  Hostnames
  =========
    host_aliases: [fargate-ip-XXX-XXX-XXX-XXX.eu-west-1.compute.internal]
    socket-fqdn: xxx-xxxxxx-xxx-server-queue-worker-5d79f6b67d-5cplq
    socket-hostname: xxx-xxxxxx-xxx-server-queue-worker-5d79f6b67d-5cplq
    host tags:
      apikey:00xxx
      aws_account_name:xxxxxxxxxxxxxxx_des
      datacenter:aws
      env:des
      platform:eks-fargate
      product:xxx
      terraform:true
    hostname provider: 
    unused hostname providers:
      configuration/environment: hostname is empty

  Metadata
  ========
    agent_version: 7.33.1
    config_apm_dd_url: 
    config_dd_url: 
    config_logs_dd_url: 
    config_logs_socks5_proxy_address: 
    config_no_proxy: []
    config_process_dd_url: 
    config_proxy_http: 
    config_proxy_https: 
    config_site: 
    feature_apm_enabled: true
    feature_cspm_enabled: false
    feature_cws_enabled: false
    feature_logs_enabled: false
    feature_networks_enabled: false
    feature_process_enabled: false
    flavor: agent
    install_method_installer_version: docker
    install_method_tool: docker
    install_method_tool_version: docker

=========
Collector
=========

  Running Checks
  ==============
    
    container
    ---------
      Instance ID: container [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/container.d/conf.yaml.default
      Total Runs: 862
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-02-17 10:07:28 UTC (1645092448000)
      Last Successful Execution Date : 2022-02-17 10:07:28 UTC (1645092448000)
      
    
    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 862
      Metric Samples: Last Run: 9, Total: 7,751
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-02-17 10:07:35 UTC (1645092455000)
      Last Successful Execution Date : 2022-02-17 10:07:35 UTC (1645092455000)
      
    
    disk (4.5.1)
    ------------
      Instance ID: disk:a1cfeb1bef22319f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 861
      Metric Samples: Last Run: 200, Total: 172,200
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 14ms
      Last Execution Date : 2022-02-17 10:07:27 UTC (1645092447000)
      Last Successful Execution Date : 2022-02-17 10:07:27 UTC (1645092447000)
      
    
    eks_fargate (2.1.0)
    -------------------
      Instance ID: eks_fargate:d734b1956a31b015 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/eks_fargate.d/conf.yaml.default
      Total Runs: 862
      Metric Samples: Last Run: 3, Total: 2,586
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 15ms
      Last Execution Date : 2022-02-17 10:07:34 UTC (1645092454000)
      Last Successful Execution Date : 2022-02-17 10:07:34 UTC (1645092454000)
      
    
    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 861
      Metric Samples: Last Run: 5, Total: 4,305
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-02-17 10:07:26 UTC (1645092446000)
      Last Successful Execution Date : 2022-02-17 10:07:26 UTC (1645092446000)
      
    
    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 862
      Metric Samples: Last Run: 52, Total: 44,788
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-02-17 10:07:33 UTC (1645092453000)
      Last Successful Execution Date : 2022-02-17 10:07:33 UTC (1645092453000)
      
    
    kubelet (7.1.0)
    ---------------
      Instance ID: kubelet:5avc64g118c18a4 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 647
      Metric Samples: Last Run: 691, Total: 437,690
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 4, Total: 2,588
      Average Execution Time : 174ms
      Last Execution Date : 2022-02-17 10:07:33 UTC (1645092453000)
      Last Successful Execution Date : 2022-02-17 10:07:33 UTC (1645092453000)
      
    
    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 861
      Metric Samples: Last Run: 6, Total: 5,166
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-02-17 10:07:25 UTC (1645092445000)
      Last Successful Execution Date : 2022-02-17 10:07:25 UTC (1645092445000)
      
    
    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 862
      Metric Samples: Last Run: 18, Total: 15,516
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-02-17 10:07:32 UTC (1645092452000)
      Last Successful Execution Date : 2022-02-17 10:07:32 UTC (1645092452000)
      
    
    ntp
    ---
      Instance ID: ntp:d752b4386b561429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 15
      Metric Samples: Last Run: 1, Total: 15
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 15
      Average Execution Time : 400ms
      Last Execution Date : 2022-02-17 10:02:13 UTC (1645092133000)
      Last Successful Execution Date : 2022-02-17 10:02:13 UTC (1645092133000)
      
    
    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 861
      Metric Samples: Last Run: 1, Total: 861
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-02-17 10:07:24 UTC (1645092444000)
      Last Successful Execution Date : 2022-02-17 10:07:24 UTC (1645092444000)
      
========
JMXFetch
========

  Information
  ==================
  Initialized checks
  ==================
    no checks
    
  Failed checks
  =============
    no checks
    
=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 1817
    Successes By Endpoint:
      check_run_v1: 862
      intake: 71
      metadata_v1: 22
      series_v1: 862

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

  API Keys status
  ===============
    API key ending with 00xxx: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.eu - API Key ending with:
      - 00xxx

==========
Logs Agent
==========


  Logs Agent is not running

=========
APM Agent
=========
  Status: Running
  Pid: 378
  Uptime: 12935 seconds
  Mem alloc: 8,637,848 bytes
  Hostname: 
  Receiver: 0.0.0.0:8126
  Endpoints:
    https://trace.agent.datadoghq.eu

  Receiver (previous minute)
  ==========================
    No traces received in the previous minute.
    Default priority sampling rate: 100.0%

  Writer (previous minute)
  ========================
    Traces: 0 payloads, 0 traces, 0 events, 0 bytes
    Stats: 0 payloads, 0 stats buckets, 0 bytes

=========
Aggregator
=========
  Checks Metric Sample: 707,708
  Dogstatsd Metric Sample: 104,296
  Number Of Flushes: 862
  Series Flushed: 643,921
  Service Check: 11,019
  Service Checks Flushed: 11,879
=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 104,295
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 7,928,147
  Udp Packet Reading Errors: 0
  Udp Packets: 74,582
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0
  Unterminated Metric Errors: 0

=============
Autodiscovery
=============
  Enabled Features
  ================
    eksfargate
    kubernetes

  Configuration Errors
  ====================
    des-xxx/xxx-xxxxxxxx-xxxx-server-queue-worker-5d79f6b67d-5cplq
    -----------------------------------------------------------------------
        annotation ad.datadoghq.com/queue_worker.logs is invalid: queue_worker doesn't match a container identifier [datadog-agent queue-worker]

Describe what happened: We have the following warning constantly:

2022-02-17 09:34:25 UTC | CORE | WARN | (pkg/security/log/logger.go:112 in Warnf) | Collector not found for container: &{{container xxxxxxxxxxxx} {xxxxx-xxxxxx-rabbit  map[] map[]} map[AWS_DEFAULT_REGION:eu-west-1 AWS_REGION:eu-west-1 AWS_ROLE_ARN:arn:aws:iam::xxxxxx:role/k8s-xxx-datadog-role AWS_WEB_IDENTITY_TOKEN_FILE:/var/run/secrets/eks.amazonaws.com/serviceaccount/token DD_ENV: DD_LOGS_INJECTION:true DD_SERVICE:xxxxxx-xxxxx-xxx DD_TRACE_CLI_ENABLED:true DD_VERSION: SERVICE_NAME:xxxxx-xxx-queue-worker STAKATER_DATADOG_SSM_SECRET:***********************************000000 STAKATER_XXXX_COMMON_SECRET:***********************************000000 STAKATER_XXXX_XXXXXXXX_ENGINE_SERVER_DATADOG_CONFIGMAP:***********************************000000 STAKATER_XXXX_XXXXXXXX_ENGINE_SERVER_ENGINE_SCM_SECRET:***********************************000000 WORKER_QUEUE_NAME:xxxxx-xxxx-reader WORKER_QUEUE_PARAMETERS:--tries=4 --timeout=240 --sleep=2]  {docker.io/xxxxx/xxxx-xxxxx-server@sha256:xxxxxxxxxxxxx docker.io/xxxxx/xxxxx-xxxx-server:5.11.5-aws docker.io/xxxxx/xxxxx-xxx-server xxxxxx-xxxx-server 5.11.5-aws} map[] 0 [] containerd {true 2022-02-17 06:33:52 +0000 UTC 0001-01-01 00:00:00 +0000 UTC}}, metrics will ne missing

2022-02-17 09:34:25 UTC | CORE | WARN | (pkg/security/log/logger.go:112 in Warnf) | Collector not found for container: &{{container xxxxxxxxxxxxx} {datadog-agent  map[] map[]} map[AWS_DEFAULT_REGION:eu-west-1 AWS_REGION:eu-west-1 AWS_ROLE_ARN:arn:aws:iam::xxxxxxxxx:role/k8s-xxx-datadog-role AWS_WEB_IDENTITY_TOKEN_FILE:/var/run/secrets/eks.amazonaws.com/serviceaccount/token DD_APM_ENABLED:true DD_EKS_FARGATE:true DD_KUBERNETES_KUBELET_NODENAME: DD_LOG_LEVEL:warn DD_SITE:datadoghq.eu DD_TAGS:env:des product:xxx platform:eks-fargate datacenter:aws aws_account_name:xxxxxx_xxx terraform:true apikey:xxxxx]  {docker.io/datadog/agent@sha256:xxxxxxxxxxxxx docker.io/datadog/agent:7-jmx docker.io/datadog/agent agent 7-jmx} map[] 0 [] containerd {true 2022-02-17 06:32:57 +0000 UTC 0001-01-01 00:00:00 +0000 UTC}}, metrics will ne missing

This repeats every 15 seconds.

Describe what you expected:

In notes of this doc(https://docs.datadoghq.com/integrations/eks_fargate/#metrics-collection) we can see: Container metrics are not available in Fargate because the cgroups volume from the host can’t be mounted into the Agent. The [Live Containers](https://app.datadoghq.com/containers) view reports 0 for CPU and Memory..

If this DD_EKS_FARGATE:true is set, why is the container metrics checking every 15 seconds? Wouldn't it be appropriate to deactivate this component and not generate warnings records?

Additional environment details (Operating System, Cloud provider, etc):

  • EKS version: v1.21.5-eks-bc4871b
  • Kubelet version: v1.21.5-eks-9017834
  • Fargate Kubelet version: v1.21.2-eks-06eac09

jotasixto avatar Feb 17 '22 10:02 jotasixto

same here.

EKS version: v1.21.5-eks-bc4871b
Datadog Agent Version: v7.33.1

This issue happens v7.33.0~, but v7.32.4 not. The following events will occur in v7.33.0~.

  • Metrics don't contain tag information
  • A lot of WARN logs run below
Collector not found for container

skikkh avatar Feb 28 '22 08:02 skikkh

Having absolutely the same issue. No idea what the reason is :crying_cat_face:

ivanilves avatar Mar 16 '22 14:03 ivanilves

I have had this issue. I am not sure what can be the root cause but I used "Annotations v1 (for Datadog Agent < v7.36)" format instead of "Annotations v2 (for Datadog Agent v7.36+)" even though my version was 7.38.2. I hope this feedback will be a good clue to datadog staff and newbies as well.

     annotations:
        ad.datadoghq.com/nginx.check_names: '["nginx"]'
        ad.datadoghq.com/nginx.init_configs: '[{}]'
        ad.datadoghq.com/nginx.instances: |
          [
            {
              "nginx_status_url":"http://%%host%%:80/nginx_status/"
            }
          ]  

veyselsahin avatar Oct 11 '22 00:10 veyselsahin