graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

MongoDB Health System Notification

Open kimoswalt opened this issue 1 year ago • 1 comments

What?

  • Can we get system notification or errors, in the UI, when MongoDB(single or multi-node) is down or in Recovering.
  • Similar to the OpenSearch cluster health notifications.
  • Or possibly something on the Node page.

Why?

  • Recently had a two case where customers Archiving was failing because one of their MongoDB nodes was in RECOVERING mode.

  • They had no idea that any of MongoDB nodes were having issues, and their archives were failing for several months.

  • If they had errors or notifications in the Graylog UI telling them their MongoDB nodes were unhealthy, or down, we may have been able to avoid the archiving issues.

Your Environment

  • Graylog Version: 6.0.5
  • MongoDB Version: 5.0.21
  • Operating System: Ubuntu

The environment I have the most detail on has 1 load balancer, 3 Graylog nodes, 3 OpenSearch nodes. The three Graylog nodes are also running MongoDB, and replication is configured.

kimoswalt avatar Sep 10 '24 17:09 kimoswalt

Just as a FYI for whoever-- to catch archival issues we (not one of the customers mentioned) have an event definition in place for message:"ARCHIVING_SUMMARY: Indices could not be archived yet" on the All system events stream.

coffee-squirrel avatar Sep 10 '24 18:09 coffee-squirrel