Different perfdata at same time
Describe the bug
I noticed that in the output of PerfdataWriter I sometimes have two different perfdata-values for the same host at the same second:
DATATYPE::HOSTPERFDATA TIMET::1655526958 HOSTNAME::DC HOSTPERFDATA::rta=0.160ms;1000.000;2000.000;0; pl=0%;25;45;; rtmax=0.287ms;;;; rtmin=0.122ms;;;; HOSTCHECKCOMMAND::icmp HOSTSTATE::UP HOSTSTATETYPE::HARD
DATATYPE::HOSTPERFDATA TIMET::1655526958 HOSTNAME::DC HOSTPERFDATA::rta=0.153ms;1000.000;2000.000;0; pl=0%;25;45;; rtmax=0.211ms;;;; rtmin=0.113ms;;;; HOSTCHECKCOMMAND::icmp HOSTSTATE::UP HOSTSTATETYPE::HARD
How can this happen? These checks are supposed to run once a minute...
To Reproduce
I dont know how to trigger it. It happens sometimes, but apparently only with host-checks (so far).
Expected behavior
Output perfdata only as often as the check runs.
Your Environment
- Version used (
icinga2 --version): 2.13.3-1 - Operating System and version: Ubuntu 22.04
- Enabled features (
icinga2 feature list): api checker ido-mysql mainlog notification perfdata
Also happens with Icinga 2.13.4-1
Hello @Virsacer!
These checks are supposed to run once a minute...
Supposed is good, guaranteed is better. Please check your check history.
Also: Have there been Icinga reloads around that time?
Best, A/K
Hi @Al2Klimov
well I meant I have set check_interval = 1m
It did not happen around a reload or restart.
DATATYPE::HOSTPERFDATA TIMET::1658320027 HOSTNAME::MYHOST HOSTPERFDATA::rta=0.006ms;1000.000;2000.000;0; pl=0%;25;45;; rtmax=0.017ms;;;; rtmin=0.003ms;;;; HOSTCHECKCOMMAND::icmp HOSTSTATE::UP HOSTSTATETYPE::HARD
DATATYPE::HOSTPERFDATA TIMET::1658320027 HOSTNAME::MYHOST HOSTPERFDATA::rta=0.009ms;1000.000;2000.000;0; pl=0%;25;45;; rtmax=0.033ms;;;; rtmin=0.003ms;;;; HOSTCHECKCOMMAND::icmp HOSTSTATE::UP HOSTSTATETYPE::HARD
I did grep "2022-07-20 14:27:07" debug.log | grep -i "/check" and removed unrelated checks:
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:27:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:27:07 +0200(1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:09 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119491
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] notice/Process: PID 119491 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:27:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:27:07 +0200(1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119492
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] notice/Process: PID 119492 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
Please inspect the whole minute, just to be sure. Oh, also keep an eye on reloads of the 2nd master.
The whole minute unfiltered is ~8000 lines What should I be looking for?
Also this is a single master.
Also this is a single master.
OK, then reloads don’t matter.
What should I be looking for?
In case of no passive checks and the host in question being checked on the node writing perfdata: just active checks.
Ok, thats basically what I did with my grep... I did have a look at the minute before and after and noticed that the check did also run twice (but not within the same second):
[2022-07-20 14:26:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:26:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:26:07 +0200(1.65832e+09).
[2022-07-20 14:26:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:26:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:25:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:27:07 +0200 (1.65832e+09).
[2022-07-20 14:26:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119170
[2022-07-20 14:26:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:26:07 +0200] notice/Process: PID 119170 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:26:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:27:07 +0200 (1.65832e+09).
[2022-07-20 14:26:09 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:26:09 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:26:09 +0200(1.65832e+09).
[2022-07-20 14:26:09 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:26:09 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:27:07 +0200 (1.65832e+09).
[2022-07-20 14:26:09 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119183
[2022-07-20 14:26:09 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:26:09 +0200] notice/Process: PID 119183 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:26:09 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:09 +0200 (1.65832e+09) to next check time at 2022-07-20 14:27:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:27:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:27:07 +0200(1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:09 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119491
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] notice/Process: PID 119491 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:27:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:27:07 +0200(1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119492
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] notice/Process: PID 119492 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:28:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:28:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:28:07 +0200(1.65832e+09).
[2022-07-20 14:28:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:28:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:29:07 +0200 (1.65832e+09).
[2022-07-20 14:28:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119796
[2022-07-20 14:28:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:28:07 +0200] notice/Process: PID 119796 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:28:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:28:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:29:07 +0200 (1.65832e+09).
[2022-07-20 14:28:14 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:28:14 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:28:14 +0200(1.65832e+09).
[2022-07-20 14:28:14 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:28:14 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:28:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:29:12 +0200 (1.65832e+09).
[2022-07-20 14:28:14 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119832
[2022-07-20 14:28:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:28:14 +0200] notice/Process: PID 119832 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:28:14 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:28:14 +0200 (1.65832e+09) to next check time at 2022-07-20 14:29:12 +0200 (1.65832e+09).
Any ideas?
but not within the same second
Within the same second (which is btw. 2022-07-20 14:27:07 +0200):
2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
Yes, in the other minutes it was not the same second...
When grepping just for the finish messages, I noticed that there are only a couple of minutes where it ran twice:
[2022-07-20 12:55:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:55:12 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:56:10 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:56:10 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:57:08 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:57:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:58:12 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:58:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:26:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:26:09 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:28:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:28:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:29:12 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:29:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 17:51:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 17:51:36 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 17:52:34 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 17:52:50 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
The debuglog I kept is from 12:44:07 to 18:53:07