Agent 5.9.1 on Alpine: CPU usage at 100% constantly
I'm running the containerized Alpine version and for some reason after a while the CPU usage jumps to 100% and stays there.
===================
Collector (v 5.9.1)
===================
Status date: 2016-10-21 11:14:39 (8s ago)
Pid: 26103
Platform: Linux-4.4.19-29.55.amzn1.x86_64-x86_64-with
Python Version: 2.7.12, 64bit
Logs: <stderr>, /opt/datadog-agent/logs/collector.log
Clocks
======
NTP offset: -0.0058 s
System UTC time: 2016-10-21 11:14:47.875449
Paths
=====
conf.d: /opt/datadog-agent/agent/conf.d
checks.d: /opt/datadog-agent/agent/checks.d
Hostnames
=========
ec2-hostname: ip-10-0-0-95.eu-west-1.compute.internal
local-ipv4: 10.0.0.95
local-hostname: ip-10-0-0-95.eu-west-1.compute.internal
socket-hostname: a3e77c5ccce1
public-hostname: ec2-52-211-182-23.eu-west-1.compute.amazonaws.com
hostname: ip-10-0-0-95.eu-west-1.compute.internal
instance-id: i-44e4f952
public-ipv4: 52.211.182.23
socket-fqdn: a3e77c5ccce1
Checks
======
nginx
-----
- instance #0 [OK]
- Collected 7 metrics, 0 events & 2 service checks
ntp
---
- Collected 0 metrics, 0 events & 1 service check
disk
----
- instance #0 [OK]
- Collected 32 metrics, 0 events & 1 service check
docker_daemon
-------------
- instance #0 [OK]
- Collected 108 metrics, 0 events & 2 service checks
http_check
----------
- instance #0 [OK]
- instance #1 [OK]
- instance #2 [OK]
- instance #3 [OK]
- Collected 4 metrics, 0 events & 9 service checks
Emitters
========
- http_emitter [OK]
===================
Dogstatsd (v 5.9.1)
===================
Status date: 2016-10-21 11:14:38 (9s ago)
Pid: 16
Platform: Linux-4.4.19-29.55.amzn1.x86_64-x86_64-with
Python Version: 2.7.12, 64bit
Logs: <stderr>, /opt/datadog-agent/logs/dogstatsd.log
Flush count: 12871
Packet Count: 0
Packets per second: 0.0
Metric count: 1
Event count: 0
Service check count: 0
===================
Forwarder (v 5.9.1)
===================
Status date: 2016-10-21 11:14:43 (4s ago)
Pid: 17
Platform: Linux-4.4.19-29.55.amzn1.x86_64-x86_64-with
Python Version: 2.7.12, 64bit
Logs: <stderr>, /opt/datadog-agent/logs/forwarder.log
Queue Size: 0 bytes
Queue Length: 0
Flush Count: 43388
Transactions received: 19253
Transactions flushed: 19253
Transactions rejected: 0
[ec2-user@ip-10-0-0-95 villev]$ docker exec a3e77c5ccce1 /opt/datadog-agent/bin/agent info
===================
Collector (v 5.9.1)
===================
Status date: 2016-10-21 11:14:58 (11s ago)
Pid: 26103
Platform: Linux-4.4.19-29.55.amzn1.x86_64-x86_64-with
Python Version: 2.7.12, 64bit
Logs: <stderr>, /opt/datadog-agent/logs/collector.log
Clocks
======
NTP offset: 0.0012 s
System UTC time: 2016-10-21 11:15:10.362467
Paths
=====
conf.d: /opt/datadog-agent/agent/conf.d
checks.d: /opt/datadog-agent/agent/checks.d
Checks
======
nginx
-----
- instance #0 [OK]
- Collected 7 metrics, 0 events & 2 service checks
ntp
---
- Collected 0 metrics, 0 events & 1 service check
disk
----
- instance #0 [OK]
- Collected 32 metrics, 0 events & 1 service check
docker_daemon
-------------
- instance #0 [OK]
- Collected 108 metrics, 1 event & 2 service checks
http_check
----------
- instance #0 [OK]
- instance #1 [OK]
- instance #2 [OK]
- instance #3 [OK]
- Collected 4 metrics, 0 events & 9 service checks
Emitters
========
- http_emitter [OK]
===================
Dogstatsd (v 5.9.1)
===================
Status date: 2016-10-21 11:15:08 (1s ago)
Pid: 16
Platform: Linux-4.4.19-29.55.amzn1.x86_64-x86_64-with
Python Version: 2.7.12, 64bit
Logs: <stderr>, /opt/datadog-agent/logs/dogstatsd.log
Flush count: 12874
Packet Count: 0
Packets per second: 0.0
Metric count: 1
Event count: 0
Service check count: 0
===================
Forwarder (v 5.9.1)
===================
Status date: 2016-10-21 11:15:08 (2s ago)
Pid: 17
Platform: Linux-4.4.19-29.55.amzn1.x86_64-x86_64-with
Python Version: 2.7.12, 64bit
Logs: <stderr>, /opt/datadog-agent/logs/forwarder.log
Queue Size: 447 bytes
Queue Length: 1
Flush Count: 43396
Transactions received: 19257
Transactions flushed: 19256
Transactions rejected: 0
Hi @dennari Thanks for notifying us of this. To help us investigate, could you please send us a flare from this agent when it's pegging cpu? Instructions can be found here.
@hkaj, unfortunately the flare command is not running cleanly. It freezes at /tmp/datadog-agent-2016-10-26-12-39-10.tar.bz2 is going to be uploaded to Datadog and nothing's happening after that.
$ docker exec 4d61f6506e73 /opt/datadog-agent/bin/agent flare
2016-10-26 12:39:10,574 | INFO | dd.collector | utils.flare(flare.py:132) | Collecting logs and configuration files:
2016-10-26 12:39:10,576 | INFO | dd.collector | utils.flare(flare.py:372) | * /opt/datadog-agent/logs/collector.log
2016-10-26 12:39:10,576 | INFO | dd.collector | utils.flare(flare.py:372) | * /opt/datadog-agent/logs/forwarder.log
2016-10-26 12:39:10,577 | INFO | dd.collector | utils.flare(flare.py:372) | * /opt/datadog-agent/logs/dogstatsd.log
2016-10-26 12:39:10,577 | INFO | dd.collector | utils.flare(flare.py:372) | * /opt/datadog-agent/logs/jmxfetch.log
2016-10-26 12:39:10,578 | INFO | dd.collector | utils.flare(flare.py:372) | * /opt/datadog-agent/logs/supervisord.log
2016-10-26 12:39:10,579 | INFO | dd.collector | utils.flare(flare.py:383) | * /opt/datadog-agent/agent/datadog.conf
2016-10-26 12:39:10,579 | INFO | dd.collector | utils.flare(flare.py:383) | * /opt/datadog-agent/agent/supervisor.conf
2016-10-26 12:39:10,580 | INFO | dd.collector | utils.flare(flare.py:383) | * /opt/datadog-agent/agent/conf.d/nginx.yaml
2016-10-26 12:39:10,581 | INFO | dd.collector | utils.flare(flare.py:383) | * /opt/datadog-agent/agent/conf.d/docker_daemon.yaml
2016-10-26 12:39:10,581 | INFO | dd.collector | utils.flare(flare.py:383) | * /opt/datadog-agent/agent/conf.d/http_check.yaml
2016-10-26 12:39:10,582 | INFO | dd.collector | utils.flare(flare.py:383) | * /opt/datadog-agent/agent/conf.d/agent_metrics.yaml.default
2016-10-26 12:39:10,583 | INFO | dd.collector | utils.flare(flare.py:383) | * /opt/datadog-agent/agent/conf.d/disk.yaml.default
2016-10-26 12:39:10,583 | INFO | dd.collector | utils.flare(flare.py:383) | * /opt/datadog-agent/agent/conf.d/ntp.yaml.default
2016-10-26 12:39:10,583 | INFO | dd.collector | utils.flare(flare.py:141) | * datadog-agent configcheck output
2016-10-26 12:39:10,594 | INFO | dd.collector | utils.flare(flare.py:143) | * service discovery configcheck output
2016-10-26 12:39:10,595 | INFO | dd.collector | utils.flare(flare.py:145) | * datadog-agent status output
2016-10-26 12:39:10,878 | INFO | dd.collector | utils.flare(flare.py:147) | * datadog-agent info output
2016-10-26 12:39:10,897 | INFO | dd.collector | utils.flare(flare.py:150) | * pip freeze
2016-10-26 12:39:11,152 | INFO | dd.collector | utils.flare(flare.py:154) | * log permissions on collected files
2016-10-26 12:39:11,153 | INFO | dd.collector | utils.flare(flare.py:135) | Saving all files to /tmp/datadog-agent-2016-10-26-12-39-10.tar.bz2
/tmp/datadog-agent-2016-10-26-12-39-10.tar.bz2 is going to be uploaded to Datadog.
@dennari the command is interactive. Try with docker exec -it 4d61f6506e73 /opt/datadog-agent/bin/agent flare ?
Ah ok, thanks. Now I got it submitted. The case is #69876.
thanks @dennari I'll have a look asap
Hi @dennari, were you able to get around this by using the non-alpine version? I'm seeing the same issue for datadog/docker-dd-agent:11.0.5141-alpine constantly.
Hi I am having the same issue plain aws linux ec2 with some node services and pm2 that runs them , the supervisord process has a constant 100% usage
Hi guys,
We have the same issue. Any news on fixes?