icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

Different perfdata at same time

Open Virsacer opened this issue 3 years ago • 9 comments

Describe the bug

I noticed that in the output of PerfdataWriter I sometimes have two different perfdata-values for the same host at the same second:

DATATYPE::HOSTPERFDATA	TIMET::1655526958	HOSTNAME::DC	HOSTPERFDATA::rta=0.160ms;1000.000;2000.000;0; pl=0%;25;45;; rtmax=0.287ms;;;; rtmin=0.122ms;;;;	HOSTCHECKCOMMAND::icmp	HOSTSTATE::UP	HOSTSTATETYPE::HARD
DATATYPE::HOSTPERFDATA	TIMET::1655526958	HOSTNAME::DC	HOSTPERFDATA::rta=0.153ms;1000.000;2000.000;0; pl=0%;25;45;; rtmax=0.211ms;;;; rtmin=0.113ms;;;;	HOSTCHECKCOMMAND::icmp	HOSTSTATE::UP	HOSTSTATETYPE::HARD

How can this happen? These checks are supposed to run once a minute...

To Reproduce

I dont know how to trigger it. It happens sometimes, but apparently only with host-checks (so far).

Expected behavior

Output perfdata only as often as the check runs.

Your Environment

  • Version used (icinga2 --version): 2.13.3-1
  • Operating System and version: Ubuntu 22.04
  • Enabled features (icinga2 feature list): api checker ido-mysql mainlog notification perfdata

Virsacer avatar Jun 18 '22 08:06 Virsacer

Also happens with Icinga 2.13.4-1

Virsacer avatar Jul 04 '22 10:07 Virsacer

Hello @Virsacer!

These checks are supposed to run once a minute...

Supposed is good, guaranteed is better. Please check your check history.

Also: Have there been Icinga reloads around that time?

Best, A/K

Al2Klimov avatar Jul 19 '22 17:07 Al2Klimov

Hi @Al2Klimov

well I meant I have set check_interval = 1m It did not happen around a reload or restart.

DATATYPE::HOSTPERFDATA	TIMET::1658320027	HOSTNAME::MYHOST	HOSTPERFDATA::rta=0.006ms;1000.000;2000.000;0; pl=0%;25;45;; rtmax=0.017ms;;;; rtmin=0.003ms;;;;	HOSTCHECKCOMMAND::icmp	HOSTSTATE::UP	HOSTSTATETYPE::HARD
DATATYPE::HOSTPERFDATA	TIMET::1658320027	HOSTNAME::MYHOST	HOSTPERFDATA::rta=0.009ms;1000.000;2000.000;0; pl=0%;25;45;; rtmax=0.033ms;;;; rtmin=0.003ms;;;;	HOSTCHECKCOMMAND::icmp	HOSTSTATE::UP	HOSTSTATETYPE::HARD

I did grep "2022-07-20 14:27:07" debug.log | grep -i "/check" and removed unrelated checks:

[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:27:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:27:07 +0200(1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:09 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119491
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] notice/Process: PID 119491 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:27:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:27:07 +0200(1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119492
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] notice/Process: PID 119492 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).

Virsacer avatar Jul 20 '22 17:07 Virsacer

Please inspect the whole minute, just to be sure. Oh, also keep an eye on reloads of the 2nd master.

Al2Klimov avatar Jul 21 '22 08:07 Al2Klimov

The whole minute unfiltered is ~8000 lines What should I be looking for?

Also this is a single master.

Virsacer avatar Jul 21 '22 10:07 Virsacer

Also this is a single master.

OK, then reloads don’t matter.

What should I be looking for?

In case of no passive checks and the host in question being checked on the node writing perfdata: just active checks.

Al2Klimov avatar Jul 21 '22 11:07 Al2Klimov

Ok, thats basically what I did with my grep... I did have a look at the minute before and after and noticed that the check did also run twice (but not within the same second):

[2022-07-20 14:26:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:26:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:26:07 +0200(1.65832e+09).
[2022-07-20 14:26:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:26:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:25:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:27:07 +0200 (1.65832e+09).
[2022-07-20 14:26:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119170
[2022-07-20 14:26:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:26:07 +0200] notice/Process: PID 119170 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:26:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:27:07 +0200 (1.65832e+09).

[2022-07-20 14:26:09 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:26:09 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:26:09 +0200(1.65832e+09).
[2022-07-20 14:26:09 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:26:09 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:27:07 +0200 (1.65832e+09).
[2022-07-20 14:26:09 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119183
[2022-07-20 14:26:09 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:26:09 +0200] notice/Process: PID 119183 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:26:09 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:09 +0200 (1.65832e+09) to next check time at 2022-07-20 14:27:07 +0200 (1.65832e+09).


[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:27:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:27:07 +0200(1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:26:09 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119491
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] notice/Process: PID 119491 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:27:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:27:07 +0200(1.65832e+09).
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).
[2022-07-20 14:27:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119492
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] notice/Process: PID 119492 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:27:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:28:07 +0200 (1.65832e+09).


[2022-07-20 14:28:07 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:28:07 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:28:07 +0200(1.65832e+09).
[2022-07-20 14:28:07 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:28:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:27:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:29:07 +0200 (1.65832e+09).
[2022-07-20 14:28:07 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119796
[2022-07-20 14:28:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:28:07 +0200] notice/Process: PID 119796 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:28:07 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:28:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:29:07 +0200 (1.65832e+09).

[2022-07-20 14:28:14 +0200] debug/CheckerComponent: Scheduling info for checkable 'MYHOST' (2022-07-20 14:28:14 +0200): Object 'MYHOST', Next Check: 2022-07-20 14:28:14 +0200(1.65832e+09).
[2022-07-20 14:28:14 +0200] debug/CheckerComponent: Executing check for 'MYHOST'
[2022-07-20 14:28:14 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:28:07 +0200 (1.65832e+09) to next check time at 2022-07-20 14:29:12 +0200 (1.65832e+09).
[2022-07-20 14:28:14 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST': PID 119832
[2022-07-20 14:28:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:28:14 +0200] notice/Process: PID 119832 ('/usr/lib/nagios/plugins/check_icmp' '-c' '2s,45%%' '-w' '1s,25%%' '-H' 'MYHOST') terminated with exit code 0
[2022-07-20 14:28:14 +0200] debug/Checkable: Update checkable 'MYHOST' with check interval '60' from last check time at 2022-07-20 14:28:14 +0200 (1.65832e+09) to next check time at 2022-07-20 14:29:12 +0200 (1.65832e+09).

Any ideas?

Virsacer avatar Jul 21 '22 18:07 Virsacer

but not within the same second

Within the same second (which is btw. 2022-07-20 14:27:07 +0200):

2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'

Al2Klimov avatar Jul 25 '22 15:07 Al2Klimov

Yes, in the other minutes it was not the same second...

When grepping just for the finish messages, I noticed that there are only a couple of minutes where it ran twice:

[2022-07-20 12:55:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:55:12 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:56:10 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:56:10 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:57:08 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:57:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:58:12 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 12:58:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'

[2022-07-20 14:26:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:26:09 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:27:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:28:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:28:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:29:12 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 14:29:14 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'

[2022-07-20 17:51:07 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 17:51:36 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 17:52:34 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'
[2022-07-20 17:52:50 +0200] debug/CheckerComponent: Check finished for object 'MYHOST'

The debuglog I kept is from 12:44:07 to 18:53:07

Virsacer avatar Jul 25 '22 16:07 Virsacer