trapdirector Handler/Rule selection for the same trap

Is your feature request related to a problem? Please describe. Let's say a vendor uses an universal trap to send state info. The Trap contains variables for state (OK/NOK), severity (minor/major/...), problem category (CPU/memory/disk/powersupply/fan/...) and an specific error message. The first thing you need are rules to differentiate between OK and NOK for this trap. Then you need additional separation for severity and and problem categories. At last the device sends out some annoying traps over several categories you want to ignore. How to create rules with such exceptions without exponential complexity?

Describe the solution you'd like There will be several ways to to solve this problem, this list is not intended to be exhaustive:

use a longest match methode (i.e. more specific rules within same trap wins)
use priorities for rules/handlers
use an adjustable processing order of rules/handlers
use a dynamic rule set within one handler by adding a new rule with individual filter-action pairs

Sep 02 '20 20:09 manfredw

I've run into that problem already and I the solution I found best is to have multiple evaluations in one rule (ordered). Something like :

OIDa contains "CPU"

OIDb > 90 then warning
OIDb < 5 then ignore

OIDa contains "not useful" then ignore

The main problem is not in implementation but how to setup the GUI for this. I was planning using something like the 'assign where' filter in the service apply rules of Director.

Sep 03 '20 09:09 patrickpr

Worked on it a little, I end up with two new DB tables and logic is :

Rule (same as actual rule) select trap based on source IP / OID

Evaluation of trap content is made by a new type of rule which has :

the rule itself based on trap content
Index of action in other table for match and no match

Action can be :

Return with code (OK/Warn/crit/nothing/ignore), display, and optionnal host/hostgroup and service reassign
Forward to another rule (the new type).

Is there something I didn't think of ?

Sep 03 '20 17:09 patrickpr

I'm not sure if you need forwarding to another rule.

The first selection criteria on trap source and trap OID is mandatory (handler). IMHO you need only one handler per host(group)/trap combination.

Within this handler, the trap content should be evaluated against an ordered list of selection rules and corresponding actions. If the rule matches, the action will be returned and no further rules from the list are checked. There could be an explicit default action on the end of the list without ruleset.

Rules with unique selection criteria will not need a specific order, but criteria like (a & b & c)->ignore and (a & b)->ok will need exactly this order to work. Of course you can also change this simple rule (a & b & !c)->ok, but with multiple "c" or additional "d" criteria it will not scale. This also should create less complex rules which are better readable and run faster.

At least you need one additional DB table. In my example it should contain rule-id, handler-id, order, rule, description and action.

Sep 03 '20 18:09 manfredw

You are right, it's much simpler like this (and doesn't need recursion). So adding two tables : <prefix>_rules_details with

Handler ref and order num
Rule
Ref to action for match / no match

<prefix>_rules_action with

display, status and optionnal reassignement to other service/host
keeping forwarding to rule for now

Sep 04 '20 08:09 patrickpr

Why do you want to use two tables for rule processing?

There is a 1:1 relation between rule and action and no need for splitting up, just between handler and rule(s) is 1:n.

I've looked into the current rules table and found some columns which seems not be used (or reserved for future use). These are ip4 and ip6 (IMHO only hostname is unique in icinga2), action_nomatch, display_nok and num_match_nok.

I would suggest the following tables (audit or statistics fields not included): _handler with

handler ID
trap OID
hostname
hostgroupname
description
default action
default revert time
default servicename

_rules with

rule ID
handler ID
order number of rule
rule
action
revert time
servicename
display

This design should gain maximum flexibilty, you can still set different services and states for the same trap/host combination. When the trap comes in, your first step will be to find the corresponding handler. After that read all rules for this specific handler and process them in the defined order (defined by order numer). If the rule matches the trap content, there is no need to process subsequent rules, just stop an return action/servicename/display information. If no rule matches, then default action is returned.

There is no need to add a second handler for this trap/host combination, just add a new rule.

Sep 04 '20 20:09 manfredw

ip4 & ip6 makes it easy to select correct handler - wihtout additional queries (IDO or API) on Icinga - when receiving traps. When receiving trap, the only information from host is the source IP.

Rules and action actually have a 1:2 relation as there is a match and a no match action for one rule. But with the new system, maybe the no match action is not needed anymore as you can do it with two rules.

Revert time is not really used and will soon be obsolete as it is easy to set it in the passive service configuration on icinga : there is also a big problem as there is no trapdirector service so I can't be sure the revert action will be sent on time to Icinga.

If I implement this, I will also need to reassign host base on rules. Use case is when VSphere is sending traps for ESX VM : all come from VSphere but alarms must be on the virtual machine host.

There must be a default display with the default action.

Anyway, your design is more simple so I will probably go for it.

Sep 04 '20 21:09 patrickpr

Hi @patrickpr, do you have any updates regarding this feature-bug? We also ran into this issue because one of our providers sends all traps with the same OID.

I first tried to accomplish this by modifying the rule, e.g. like: ( $3$ = 2 ) & ($3.oid$ = ".1.3.6.1.2.1.1.3.0" ) to determine that the specific OID is the one I need.

Aug 08 '23 12:08 Tqnsls