Handler/Rule selection for the same trap
Is your feature request related to a problem? Please describe. Let's say a vendor uses an universal trap to send state info. The Trap contains variables for state (OK/NOK), severity (minor/major/...), problem category (CPU/memory/disk/powersupply/fan/...) and an specific error message. The first thing you need are rules to differentiate between OK and NOK for this trap. Then you need additional separation for severity and and problem categories. At last the device sends out some annoying traps over several categories you want to ignore. How to create rules with such exceptions without exponential complexity?
Describe the solution you'd like There will be several ways to to solve this problem, this list is not intended to be exhaustive:
- use a longest match methode (i.e. more specific rules within same trap wins)
- use priorities for rules/handlers
- use an adjustable processing order of rules/handlers
- use a dynamic rule set within one handler by adding a new rule with individual filter-action pairs
I've run into that problem already and I the solution I found best is to have multiple evaluations in one rule (ordered). Something like :
- OIDa contains "CPU"
- OIDb > 90 then warning
- OIDb < 5 then ignore
- OIDa contains "not useful" then ignore
The main problem is not in implementation but how to setup the GUI for this. I was planning using something like the 'assign where' filter in the service apply rules of Director.
Worked on it a little, I end up with two new DB tables and logic is :
Rule (same as actual rule) select trap based on source IP / OID
Evaluation of trap content is made by a new type of rule which has :
- the rule itself based on trap content
- Index of action in other table for match and no match
Action can be :
- Return with code (OK/Warn/crit/nothing/ignore), display, and optionnal host/hostgroup and service reassign
- Forward to another rule (the new type).
Is there something I didn't think of ?
I'm not sure if you need forwarding to another rule.
The first selection criteria on trap source and trap OID is mandatory (handler). IMHO you need only one handler per host(group)/trap combination.
Within this handler, the trap content should be evaluated against an ordered list of selection rules and corresponding actions. If the rule matches, the action will be returned and no further rules from the list are checked. There could be an explicit default action on the end of the list without ruleset.
Rules with unique selection criteria will not need a specific order, but criteria like (a & b & c)->ignore and (a & b)->ok will need exactly this order to work. Of course you can also change this simple rule (a & b & !c)->ok, but with multiple "c" or additional "d" criteria it will not scale. This also should create less complex rules which are better readable and run faster.
At least you need one additional DB table. In my example it should contain rule-id, handler-id, order, rule, description and action.
You are right, it's much simpler like this (and doesn't need recursion). So adding two tables : <prefix>_rules_details with
- Handler ref and order num
- Rule
- Ref to action for match / no match
<prefix>_rules_action with
- display, status and optionnal reassignement to other service/host
- keeping forwarding to rule for now
Why do you want to use two tables for rule processing?
There is a 1:1 relation between rule and action and no need for splitting up, just between handler and rule(s) is 1:n.
I've looked into the current rules table and found some columns which seems not be used (or reserved for future use). These are ip4 and ip6 (IMHO only hostname is unique in icinga2), action_nomatch, display_nok and num_match_nok.
I would suggest the following tables (audit or statistics fields not included):
- handler ID
- trap OID
- hostname
- hostgroupname
- description
- default action
- default revert time
- default servicename
- rule ID
- handler ID
- order number of rule
- rule
- action
- revert time
- servicename
- display
This design should gain maximum flexibilty, you can still set different services and states for the same trap/host combination. When the trap comes in, your first step will be to find the corresponding handler. After that read all rules for this specific handler and process them in the defined order (defined by order numer). If the rule matches the trap content, there is no need to process subsequent rules, just stop an return action/servicename/display information. If no rule matches, then default action is returned.
There is no need to add a second handler for this trap/host combination, just add a new rule.
ip4 & ip6 makes it easy to select correct handler - wihtout additional queries (IDO or API) on Icinga - when receiving traps. When receiving trap, the only information from host is the source IP.
Rules and action actually have a 1:2 relation as there is a match and a no match action for one rule. But with the new system, maybe the no match action is not needed anymore as you can do it with two rules.
Revert time is not really used and will soon be obsolete as it is easy to set it in the passive service configuration on icinga : there is also a big problem as there is no trapdirector service so I can't be sure the revert action will be sent on time to Icinga.
If I implement this, I will also need to reassign host base on rules. Use case is when VSphere is sending traps for ESX VM : all come from VSphere but alarms must be on the virtual machine host.
There must be a default display with the default action.
Anyway, your design is more simple so I will probably go for it.
Hi @patrickpr, do you have any updates regarding this feature-bug? We also ran into this issue because one of our providers sends all traps with the same OID.
I first tried to accomplish this by modifying the rule, e.g. like:
( $3$ = 2 ) & ($3.oid$ = ".1.3.6.1.2.1.1.3.0" ) to determine that the specific OID is the one I need.