janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

Healthcheck support for accessing gremlin in Kubernetes

Open mukesh-panigrahi-searce opened this issue 4 years ago • 4 comments

Describe the feature: It will be better if we can have a proper health check (returning 200 response) path that can be passed as the readiness probe to the Kubernetes deployments which will lead to backend health check probe creation in managed kubernetes provider like GKE. Describe a specific use case for the feature: We are looking to expose the janusgraph service to be accessible from an internet exposable HTTPS endpoint with a secure method. With current options in place, we can use a TCP External LB only to expose the service but GKE doesn't support SSL proxy LB to it. Ingress looks like a feasible solution but the health check creation fails due to the lack of 200 response on default /healthz path . I can see a health check file in tmp directory which executes a python file to get the response. Not sure if we can use that to fix this. Ref link1: https://github.com/helm/charts/blob/master/stable/janusgraph/values.yaml Ref link2: https://github.com/FairwindsOps/helm-charts/blob/master/stable/janusgraph/templates/deployment.yaml#L37 python file content:

import json, urllib2

response = json.loads(urllib2.urlopen('http://localhost:8182?gremlin=graph.open').read())["result"]["data"][0]
print (response)
assert response == True

Which returns 'True' in response.

mukesh-panigrahi-searce avatar Sep 17 '21 20:09 mukesh-panigrahi-searce

In my use case, we simply added following readinessProbe to k8s deployment:

readinessProbe:
    exec:
        command: [gremlin.sh, -e, scripts/remote-connect.groovy]

lionelfleury avatar Jan 05 '22 11:01 lionelfleury

In my use case, we simply added following readinessProbe to k8s deployment:

readinessProbe:
    exec:
        command: [gremlin.sh, -e, scripts/remote-connect.groovy]

The gremlin console seems to be drawing a lot of resources when starting up. I had defined this check every 10 seconds in the past and it was running 100% of the CPU (16 core, high end processor) for 3-4 seconds. It's just not suited to be used as a liveness probe.

In terms of stability and being properly integrated with systems like Kubernetes Janusgraph requires having endpoints for readiness, startup and liveness. Preferably through a HTTP GET requests that doesn't require extra tools/programs to do the checks.

Janusgraph needs some time to start up because in most configurations it relies on storage and indexing backends to work - so a startup endpoint is required to tell Kubernetes when it can the check the instance for readiness and liveness.

A readiness endpoint is required for telling Kubernetes if the instance is healthy or if it has encountered a runtime error. If there's a runtime error Kubernetes can restart the pod and it's very important for monitoring the cluster for issues.

Lastly, a proper liveness endpoint is required for Kubernetes to know if the instance is ready for traffic. Again it's important for monitoring, debugging, circuit break situations etc.

Givemeurcookies avatar Jun 15 '23 12:06 Givemeurcookies

I agree that a more lightweight approach is preferable for such probes. The HTTP endpoint of JanusGraph Server could be a simple first option: https://docs.janusgraph.org/operations/server/#janusgraph-server-as-a-http-endpoint

Another option is to write a small application that only performs one traversal and reports success or failure. This could be a JAR as the Docker image already has a Java runtime or maybe also a Go binary using Gremlin-Go. We could probably also already include such an application to the official Docker images, but that would need to be configurable so users can specify the traversal and credentials.

FlorianHockmann avatar Jun 16 '23 07:06 FlorianHockmann

Kubernetes allows grpc request for health checks. We could use our grpc server.

farodin91 avatar Jun 16 '23 07:06 farodin91