naemon-core icon indicating copy to clipboard operation
naemon-core copied to clipboard

Repeated notifications for DOWN -> UNREACHABLE -> DOWN hard state transition

Open irasnyd opened this issue 7 years ago • 0 comments

Problem

Naemon is sending me repeated host notifications for a host which is currently turned off. This host is far away from me, and is situated behind a router which has a flaky Internet connection. The setup looks like this:

Naemon -> Internet (WAN) -> Router -> Device

Every time I experience short drop outs (packet loss) to the Router, I get a repeated email notification that Device is DOWN. This device has been persistently DOWN for many months (it is unplugged at the moment, awaiting replacement).

I believe this happens because Naemon sees the device transition across the states:

DOWN -> UNREACHABLE -> DOWN -> ...

I believe the host stays in a HARD state throughout. It can only become DOWN or UNREACHABLE. No other states are possible, since the machine is unplugged.

I would like to prevent this repeated notification.

Configuration

define host {
  host_name                     device.example.com
  use                           check_mk_host
  address                       10.9.0.11
  _TAGS                         g_elp ntpclock p_elp_router snmp
  _ADDRESS_FAMILY               4
  _ADDRESS_6
  _ADDRESS_4                    10.9.0.11
  check_command                 check-mk-host-ping!-w 200.00,80.00% -c 500.00,100.00%
  _FILENAME                     /elp.mk
  hostgroups                    ELP
  contact_groups                isnyder
  parents                       router.example.com
  max_check_attempts            10
  notification_options          d,r
}

define host {
  host_name                     router.example.com
  use                           check_mk_host
  address                       10.9.0.253
  _TAGS                         g_elp p_elp_border snmp switch
  _ADDRESS_FAMILY               4
  _ADDRESS_6
  _ADDRESS_4                    10.9.0.253
  check_command                 check-mk-host-ping!-w 200.00,80.00% -c 500.00,100.00%
  _FILENAME                     /elp.mk
  hostgroups                    ELP
  parents                       border.example.com
  max_check_attempts            10
  notification_options          d,r
}

Log Messages

[1529429449] INITIAL HOST STATE: device.example.com;DOWN;HARD;10;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529429826] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529429826] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529429878] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529429878] HOST NOTIFICATION: isnyder;device.example.com;DOWN;host-notify-by-email-html;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529429949] HOST NOTIFICATION SUPPRESSED: device.example.com;Re-notification blocked for this problem.
[1529430186] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430186] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430207] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430207] HOST NOTIFICATION: isnyder;device.example.com;DOWN;host-notify-by-email-html;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430231] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430231] HOST FLAPPING ALERT: device.example.com;STARTED; Host appears to have started flapping (22.4% change > 20.0% threshold)
[1529430231] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications about FLAPPING events blocked for this object.
[1529430254] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430276] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430276] HOST NOTIFICATION SUPPRESSED: device.example.com;Notification blocked because the object is currently flapping.
[1529430298] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430298] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430355] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430355] HOST NOTIFICATION SUPPRESSED: device.example.com;Notification blocked because the object is currently flapping.
[1529430483] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430483] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430502] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430502] HOST NOTIFICATION SUPPRESSED: device.example.com;Notification blocked because the object is currently flapping.
[1529430669] HOST ALERT: device.example.com;UNREACHABLE;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430669] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications disabled for current object state.
[1529430713] HOST ALERT: device.example.com;DOWN;HARD;1;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529430713] HOST NOTIFICATION SUPPRESSED: device.example.com;Notification blocked because the object is currently flapping.
[1529431243] INITIAL HOST STATE: device.example.com;DOWN;HARD;10;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529431434] HOST FLAPPING ALERT: device.example.com;STOPPED; Host appears to have stopped flapping (3.9% change < 5.0% threshold)
[1529431434] HOST NOTIFICATION SUPPRESSED: device.example.com;Notifications about FLAPPING events blocked for this object.
[1529431493] HOST NOTIFICATION: isnyder;device.example.com;DOWN;host-notify-by-email-html;CRITICAL - 10.9.0.11: rta nan, lost 100%
[1529431554] HOST NOTIFICATION SUPPRESSED: device.example.com;Re-notification blocked for this problem.

Naemon Version Information:

$ naemon --version

Naemon Core 1.0.6.2-omd
Copyright (c) 2013-present Naemon Core Development Team and Community Contributors
...

OMD Version Information:

$ omd version
OMD - Open Monitoring Distribution Version 2.70-labs-edition

Screenshots

availability-report duplicate-notifications

irasnyd avatar Jun 19 '18 19:06 irasnyd