Repeated notifications for DOWN -> UNREACHABLE -> DOWN hard state transition
Problem
Naemon is sending me repeated host notifications for a host which is currently turned off. This host is far away from me, and is situated behind a router which has a flaky Internet connection. The setup looks like this:
Naemon -> Internet (WAN) -> Router -> Device
Every time I experience short drop outs (packet loss) to the Router, I get a repeated email notification that Device is DOWN. This device has been persistently DOWN for many months (it is unplugged at the moment, awaiting replacement).
I believe this happens because Naemon sees the device transition across the states:
DOWN -> UNREACHABLE -> DOWN -> ...
I believe the host stays in a HARD state throughout. It can only become DOWN or UNREACHABLE. No other states are possible, since the machine is unplugged.
I would like to prevent this repeated notification.
Configuration
define host {
host_name device.example.com
use check_mk_host
address 10.9.0.11
_TAGS g_elp ntpclock p_elp_router snmp
_ADDRESS_FAMILY 4
_ADDRESS_6
_ADDRESS_4 10.9.0.11
check_command check-mk-host-ping!-w 200.00,80.00% -c 500.00,100.00%
_FILENAME /elp.mk
hostgroups ELP
contact_groups isnyder
parents router.example.com
max_check_attempts 10
notification_options d,r
}
define host {
host_name router.example.com
use check_mk_host
address 10.9.0.253
_TAGS g_elp p_elp_border snmp switch
_ADDRESS_FAMILY 4
_ADDRESS_6
_ADDRESS_4 10.9.0.253
check_command check-mk-host-ping!-w 200.00,80.00% -c 500.00,100.00%
_FILENAME /elp.mk
hostgroups ELP
parents border.example.com
max_check_attempts 10
notification_options d,r
}
Log Messages
[1529429449] INITIAL HOST STATE: device.example.com;DOWN;HARD;10;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529429826] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529429826] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529429878] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529429878] HOST NOTIFICATION: isnyder;device.example.com;DOWN;host-notify-by-email-html;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529429949] HOST NOTIFICATION SUPPRESSED: device.example.com;Re-notification blocked for this problem.
[1529430186] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430186] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430207] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430207] HOST NOTIFICATION: isnyder;device.example.com;DOWN;host-notify-by-email-html;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430231] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430231] HOST FLAPPING ALERT: device.example.com;STARTED; Host appears to have started flapping (22.4% change > 20.0% threshold)
[1529430231] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications about FLAPPING events blocked for this object.
[1529430254] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430276] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430276] HOST NOTIFICATION SUPPRESSED: device.example.com;Notification blocked because the object is currently flapping.
[1529430298] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430298] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430355] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430355] HOST NOTIFICATION SUPPRESSED: device.example.com;Notification blocked because the object is currently flapping.
[1529430483] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430483] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430502] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430502] HOST NOTIFICATION SUPPRESSED: device.example.com;Notification blocked because the object is currently flapping.
[1529430669] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430669] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430713] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430713] HOST NOTIFICATION SUPPRESSED: device.example.com;Notification blocked because the object is currently flapping.
[1529431243] INITIAL HOST STATE: device.example.com;DOWN;HARD;10;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529431434] HOST FLAPPING ALERT: device.example.com;STOPPED; Host appears to have stopped flapping (3.9% change < 5.0% threshold)
[1529431434] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications about FLAPPING events blocked for this object.
[1529431493] HOST NOTIFICATION: isnyder;device.example.com;DOWN;host-notify-by-email-html;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529431554] HOST NOTIFICATION SUPPRESSED: device.example.com;Re-notification blocked for this problem.
Naemon Version Information:
$ naemon --version
Naemon Core 1.0.6.2-omd
Copyright (c) 2013-present Naemon Core Development Team and Community Contributors
...
OMD Version Information:
$ omd version
OMD - Open Monitoring Distribution Version 2.70-labs-edition
Screenshots
