Collect node status information outside of the Raft sub-system
Cover letter
This PR introduces a new node status mechanism that operates outside of Raft. The node_status_provider subsystem runs on shard 0 of every node. Briefly it operates like this:
- Maintain a list of peers via a callback from the members table.
- Periodically send a node_status RPC to all known peers and keep track of when the last reply was received from each node. Note that this sub-system creates and uses it's own TCP connections and does not reuse the ones created by the RPC server. This is done in order to contention.
- Expose the metadata in the node_status request along with a timestamp of when the peer was last seen.
The users of this sub-system should be other low-level abstractions: the RPC and consensus layers. Higher level sub-systems should continue to use the health monitor.
This PR does not include integrations of the new subsystem. Those will come in follow up PRs. Currently the following integrations are planned:
- Loosening of Raft timeouts if the leader has recently responded to a node_status RPC. This should result in less election activity when the system runs into heartbeat time-outs due to resource contention caused by competing Raft groups.
- Expose disk queue metadata for the coordinate recovery subsystem.
- Integrate with the health monitor in order to make the following liveness definition available to higher level subsystems: "A node is alive if it can respond to node_status request" in a timely manner. This will co-exist with the existing Raft definition of liveness.
To keep this PR relatively small I've omitted a few things which I plan to add in separate PRs. They're listed below. Let me know if you think they have to be part of this PR.
- Currently, no RPCs are sent until enough of the controller log has been replayed for the feature manager to become aware the feature is active. We can address this by persisting the feature manager state to the kv-store and initialising from it.
- The list of peers is not initialised with the latest Raft configuration from the kv-store.
- Metrics for the connections created by node_status_provider are currently disabled. The labels used there are the target IP and port and they currently clash as the node_status_provider connections are made to the same port as the normal RPC connections.
Backport Required
- [X] not a bug fix
- [ ] issue does not exist in previous branches
- [ ] papercut/not impactful enough to backport
- [ ] v22.2.x
- [ ] v22.1.x
- [ ] v21.11.x
Changes in force-push:
- Guard node-status provider behind a new feature
- Address a few implementation details comments
Changes in last two force-pushes (last and one before last):
- Reorganised commits to get each one building and have a sensible order
- Addressed outstanding comments
Changes in force-push:
- Made node_status_provider a sharded service running on a single shard (see this comment for context)
Changes in force-push:
- Added a new label to client connections that ties them to a connection cache. This does away with double registration problems and allows us to enable metrics for the node_status_provider connections.
While working on the raft integration I realised that it would be useful to have node status data shard-local. The metadata is updated on all shards once all responses are collected. I've also split node_status_provider into node_status_backend and node_status_table in the process. See force push for details.
Changes in force-push:
- make node_status_backend::start re-entrant
- fix incorrect template type
- pass sharded arg via sharded_parameter
Changes in force-push:
- distribute connections created by node_status_backend across multiple shards
Changes in force-push:
- rebased on tip of dev
Changes in force-push: Removed commit introducing with_node_client_locally.
/ci-repeat 1 dt-repeat=10 tests/rptest/tests/node_status_test.py