trapdirector icon indicating copy to clipboard operation
trapdirector copied to clipboard

Add support for zones

Open Copis opened this issue 5 years ago • 10 comments

Is your feature request related to a problem? Please describe. We have a master zone and some satellite zones behind a vpn or firewall. In that cases the master couldn't receive traps.

Describe the solution you'd like Whould be great to be abble to receibe these snmp traps in one satellite endpoing and sent the status to master

Describe alternatives you've considered Forward snmp tramps from satellite to master

Copis avatar May 12 '20 23:05 Copis

Hi,

It's a good feature, I will start working on it for next version.

Are you able to test this (my lab environment does not include master/sattellite setting) ?

patrickpr avatar May 13 '20 15:05 patrickpr

I am about to do a multi-zone build, and will be able to test this in coming days/weeks.

robdevops avatar May 26 '20 11:05 robdevops

I can test this scenario in my developement environment with one master/one statellite but i think should be better to test into ha environment with two masters/two satellites if it's possible.

Copis avatar Jun 09 '20 07:06 Copis

For update : I'm currently building the test environement for this.

patrickpr avatar Jun 15 '20 15:06 patrickpr

@Copis : architecture of satellites is work in progress.

Test environment : two masters in HA and two satellites in HA.

Traps can be received by :

  • master ( if there is a HA master using VRRP (keealived) IP)
  • satellite (if there is a HA sat, using VRRP too).

Satellite receives and process traps using configuration provided by masters and :

  • update database using a simple API provided by trapdirector module on masters.
  • Send passive service check results to satellites (or to master, this isn't decided yet).

For now, there is no zone for trap rules : they are global.

I assume :

  1. satellites can have access to master (and masterHA) on :
  • Icinga API port (5665 by default)
  • Icingaweb2 HTTP port (443) (Satellites will use a specific Icingaweb2 user)
  1. Master and master HA both have access to the trapdirector database.

  2. Latency between master(s) and sat(s) is low (<500ms)

I'm opened to comments and suggestions !

patrickpr avatar Jul 06 '20 20:07 patrickpr

One of the problems that i see is in some scenarios cannot have VRRP for example in Active-Passive or Active-Active CPD with no extended vlans. In that case there are no posible implementation

Copis avatar Sep 01 '20 13:09 Copis

Opened a topic here to talk about it : https://community.icinga.com/t/trapdirector-ha-feature/5439

patrickpr avatar Sep 03 '20 08:09 patrickpr

So here are some thoughts about it:

  1. As long as all instances of trap director talk to the same DB, it shouldn't matter how many there are.
  2. Traps can be forwarded from any nodes they can be received to any snmptrapd on trapdirector nodes. This enables chaining them through firewalls to the nodes where they can be processed properly.
  3. When trapdirector processes trap, it sends result to API of satellite/master. Why not both in a configurable order? So if you send result to satellite and you don't like return or its unreachable, you resend it to master or another satellite.
  4. In this scenario you'd have to worry about deduplication of traps if you choose to do HA by trying to send traps to all existing trapdirector instances which don't know about each other but share DB. Maybe theres even some cheap way to discard duplicates which is better than DB lookup for last 5 seconds worth of traps to see if it was already processed by fellow trapdirectors.

p4k8 avatar Sep 03 '20 12:09 p4k8

1. As long as all instances of trap director talk to the same DB, it shouldn't matter how many there are.

Correct, but DB connexion may be impossible on distant sites.

2. Traps can be forwarded from any nodes they _can_ be received to any snmptrapd on trapdirector nodes. This enables chaining them through firewalls to the nodes where they can be processed properly.

Some kind of trap routing ? Not very easy to implement !!!

3. When trapdirector processes trap, it sends result to API of satellite/master. Why not both in a configurable order? So if you send result to satellite and you don't like return or its unreachable, you resend it to master or another satellite.

Yes : satellite then master or master only (maybe set this by zones ?)

4. In this scenario you'd have to worry about deduplication of traps if you choose to do HA by trying to send traps to all existing trapdirector instances which don't know about each other but share DB. Maybe theres even some cheap way to discard duplicates which is better than DB lookup for last 5 seconds worth of traps to see if it was already processed by fellow trapdirectors.

There is a special 'waiting' status in DB that was implemented for this kind of things.

patrickpr avatar Sep 03 '20 16:09 patrickpr

DB connexion may be impossible on distant sites

So thats why it might be sound idea not to make any trapdirectors on distant sites. Like DB <--> trapdirector <--snmptrapd on trapdirector host <-- firewalls/networks/whatever <-- snmptrapd with forward directive on remote site "HA" in this part is achieved by forwarding traps from remote host to several trapdirector destinations simultaneously and then each of the trapdirectors would have list of API endpoints to send check result to. So that would mean getting trap at least once, and maximum as many as there are snmptrapd forward destinations. That's solved by deduping stuff I guess.

Some kind of trap routing

More like, just adding forward default <address> to snmptrapd.conf pointing at snmptrapd on proper trapdirector node.

maybe set this by zones

Not sure if it actually has to be zone-aware to work properly as long as the endpoint addresses are listed in the correct order.

p4k8 avatar Sep 04 '20 05:09 p4k8