juno icon indicating copy to clipboard operation
juno copied to clipboard

Health Check for Node

Open stdevMac opened this issue 3 years ago • 6 comments

We should be able to provide a RPC endpoint, and API endpoint that basically knows when the node is running properly, the db connections are ok, and the feeder gateway is responding well.

stdevMac avatar Jun 19 '22 19:06 stdevMac

Anyone tackling this issue? I don't mind giving it a try as I need an health check for the k8s deployment.

v4lproik avatar Jul 04 '22 17:07 v4lproik

After looking into this, several questions came to mind including:

  • Mdbx database doesn't have a sort of a 'PING' request meaning that either we base our check using the open function or using a function calling the database (eg. get(key)). Any thoughts?
  • After reading the alpha mainnet documentation, it doesn't seem there's an endpoint exposing the gateway feeder status. Same comment as before, should we base the check on a get(transactionId) sort of request?

I'd like to expose an expose containing the information of the different health checks like:

{
  "status": "Service Unavailable",
  "errors": {
    "feeder_gateway": "dial tcp <GATEWAY_ADDRESS>: getsockopt: connection refused",
    "database": "error reading database"
  }
}

With more or less information returned if you guys want an internal healthcheck or a public one.

v4lproik avatar Jul 05 '22 09:07 v4lproik

We are doing some arrangements around how we set the services to each of them will contain a health check, will be better to wait until that is done, thanks for pointing out this! We assume the database is ok because if there is any corruption appears at any time, the app is going to close, basically because is the core and if you have any problem, you will not be able of retrieve/store any info needed and the node will close

stdevMac avatar Jul 05 '22 10:07 stdevMac

Yes, it definetly sounds like a good idea to add an healthcheck for each service. More resilient and easier to pin point exactly which service is not available when downtime occurs. Oh I see, thanks for the explanation about the database!

v4lproik avatar Jul 05 '22 10:07 v4lproik

Thanks for your contributions! if any other question appears, let us know! Looking forward for more adds from your side

stdevMac avatar Jul 05 '22 10:07 stdevMac

Possible duplicate of #324.

tshakalekholoane avatar Aug 11 '22 11:08 tshakalekholoane