Shuffle Create fully working containerD / K8s implementation of Orborus + Worker

The goal of this task is to make the entire Orchestrator work with kubernetes & containerd directly. More details to come

Basic things to learn: Do in a new .go program

How to create a container in k8s. k8s jobs? Are there startup errors? Can you control resource usage?
How to allow a k8s container control another k8s container?
How can you clean up old containers?
How can you download images from X registry?

After basics

Implement it in Orborus.go. Make sure Docker still works at the same time. Just have env variables to control it
Implement Worker.

Oct 26 '22 15:10 frikky

Gave Yash an introduction to the Docker setup so that he can try to build a PoC for Orborus in Kubernetes.

Focus was on the "deployServiceWorkers" function in Orborus: https://github.com/Shuffle/Shuffle/blob/be0aed82e3d89cd702663ca1e6d52ae0a830b9bc/functions/onprem/orborus/orborus.go#L202

Oct 31 '22 12:10 frikky

Hi! Any news regarding this issue? We are interested on running a whole Shuffle stack in a Kubernetes cluster, but for that I understand Orborus needs containerd support. Is it already supported?

Thanks!

Dec 20 '22 12:12 TheMatrix97

Hi! Any news regarding this issue? We are interested on running a whole Shuffle stack in a Kubernetes cluster, but for that I understand Orborus needs containerd support. Is it already supported?

Thanks!

It's not supported entirely yet, but we have one person working on it. Orborus is the most complex thing in our stack, and I'm trying to help other people understand the complexities too. We will have it done by end of Q2 this year, but I'd hope to have it earlier to help more people make use of it earlier.

Jan 04 '23 22:01 frikky

At a suggestion, couldn't this be done with K8s Jobs via API (with a K8s service account) with client-go rather than doing it directly with containerD? Perhaps I misunderstand but I think it should be doable and a lot easier. Jared Jennings did similar to run Cortex Neurons as Kubernetes Jobs but that was done in Scalar not Go. Have a look at https://github.com/TheHive-Project/Cortex/compare/master...jaredjennings:Cortex:k8s-job-runner and https://j.agrue.info/tag/cortex.html

Unfortunately I don't know Go and likely don't have time to learn it otherwise I'd do it myself.

Jan 13 '23 03:01 beejaygee

At a suggestion, couldn't this be done with K8s Jobs via API (with a K8s service account) with client-go rather than doing it directly with containerD? Perhaps I misunderstand but I think it should be doable and a lot easier. Jared Jennings did similar to run Cortex Neurons as Kubernetes Jobs but that was done in Scalar not Go. Have a look at TheHive-Project/[email protected]:Cortex:k8s-job-runner and https://j.agrue.info/tag/cortex.html

Unfortunately I don't know Go and likely don't have time to learn it otherwise I'd do it myself.

Had no idea about this, thanks! It will probably be the K8s API like you're talking about for sure, but we're not sure exactly about the container management itself just yet.

Thanks for sharing the Jobs API - I'm unsure of the difference between this and our current architecture, but we'll for sure check it out

Jan 13 '23 10:01 frikky

If Orborus works the same as Cortex Neurons do then it should be sufficient for the task. Jobs clean up after themselves. If I understand correctly, all Shuffle does is launch a docker container inside of another docker container on a Docker node in the Docker Swarm. Kubernetes Jobs will launch a one-time container (containerD by default IIRC) on a Kubernetes node in your Kubernetes cluster and will then stop the container and clean up once the container exits.

Jan 13 '23 10:01 beejaygee

If Orborus works the same as Cortex Neurons do then it should be sufficient for the task. Jobs clean up after themselves. If I understand correctly, all Shuffle does is launch a docker container inside of another docker container on a Docker node in the Docker Swarm. Kubernetes Jobs will launch a one-time container (containerD by default IIRC) on a Kubernetes node in your Kubernetes cluster.

You're right that this is A way to do it, and our original design. We are however moving away from this architecture (and have already) with Docker swarm, and will for K8s, as container Cold-starts are extremely CPU intensive, which doesn't work well when you may launch hundreds/second.

We do instead run the apps (Neurons~) with HTTP servers on them, as to make everything both faster and WAY less compute intensive. The old way was just not scalable enough

Jan 13 '23 10:01 frikky

We do instead run the apps (Neurons~) with HTTP servers on them, as to make everything both faster and WAY less compute intensive. The old way was just not scalable enough

The idea being that apps aren't "rerun" each execution? Sounds like a better use-case then would be to create a Kubernetes Deployment via API. Let Kubernetes handle node placement etc and restart on container crash etc.

All I'm saying is I would highly advise against doing it via containerD directly. K8s already does all the hard work for you in terms of scheduling pods on nodes.

PS K8s Operators are available should you need to do more advanced management

Jan 13 '23 11:01 beejaygee

Using the K8s API is exactly the plan, just as we've done with Docker so far. I believe K8s operators are overrated for something like this, where instead we need K8s programmers. The problem isn't really apparent at small scales (10s of apps), but it is a huge Orchestration management problem that has to happen directly through the K8s API.

How would you recommend actually interacting with the K8s API? Through a socket mount like with Docker, or is there a better way Pods inside the cluster can control themselves and their scale?

Jan 13 '23 12:01 frikky

How would you recommend actually interacting with the K8s API? Through a socket mount like with Docker, or is there a better way Pods inside the cluster can control themselves and their scale?

That's a good question. I assume horizontal scaling will work in this case and likely is the way to go. However, there is also vertical pod autoscaler btw.

I would set up Orborus to run as a K8s Deployment (or could be a stateful set if needed but probably not needed for Shuffle) and the Orborus container to have Go code that uses client-go (via in cluster authentication) to connect to the K8s cluster (using a k8s service account to provide the necessary rights) to create a deployment for the Shuffle app when the app is "enabled". By default the deployment will create a Kubernetes load balancer for the app (deployment). Then make use of HorizontalPodAutoscaler to scale up with increased demand. When the app is "disabled" in Shuffle then remove the deployment.

See here for example of using client-go for an in cluster authentication to K8s that will need a k8s ServiceAccount: https://github.com/kubernetes/client-go/blob/master/examples/in-cluster-client-configuration/main.go#L41-L50

See here for HorizontalPodAutoscaler: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Probably a good idea to use a k8s namespace dedicated for Shuffle just to keep things a bit more organised and not to clutter out default namespace.

Jan 13 '23 12:01 beejaygee

How would you recommend actually interacting with the K8s API? Through a socket mount like with Docker, or is there a better way Pods inside the cluster can control themselves and their scale?

That's a good question. I assume horizontal scaling will work in this case and likely is the way to go. However, there is also vertical pod autoscaler btw.

I would set up Orborus to run as a K8s Deployment (or could be a stateful set if needed but probably not needed for Shuffle) and the Orborus container to have Go code that uses client-go (via in cluster authentication) to connect to the K8s cluster (using a k8s service account to provide the necessary rights) to create a deployment for the Shuffle app when the app is "enabled". By default the deployment will create a Kubernetes load balancer for the app (deployment). Then make use of HorizontalPodAutoscaler to scale up with increased demand. When the app is "disabled" in Shuffle then remove the deployment.

See here for example of using client-go for an in cluster authentication to K8s that will need a k8s ServiceAccount: https://github.com/kubernetes/client-go/blob/master/examples/in-cluster-client-configuration/main.go#L41-L50

See here for HorizontalPodAutoscaler: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Probably a good idea to use a k8s namespace dedicated for Shuffle just to keep things a bit more organised and not to clutter out default namespace.

This then sounds exactly like how we've built the Docker system already, and just requires a K8s migration for the API connection w/auth.

Horizontal + Vertical are both applicable and already implemented too, so we're far along the architecture route (and with docker implemented), so as long as the container image management itself works well, the next steps shouldn't be too hard

Jan 13 '23 13:01 frikky

This then sounds exactly like how we've built the Docker system already, and just requires a K8s migration for the API connection w/auth.

Yep, that's what I figured. I was a little surprised to hear that this bit was taking you so long considering K8s does most of the heavy lifting for you. K8s is much nicer than Docker IMO :)

Vertical

This should help on vertical front: https://medium.com/infrastructure-adventures/vertical-pod-autoscaler-deep-dive-limitations-and-real-world-examples-9195f8422724

I'd suggest for testing you use something like AWS EKS and avoid testing on baremetal if you can because it's more tricky to set up such an environment (e.g automatic load balancer configuration won't work without setting up metallb). I'm happy to help out with testing, however.

Another tip for Kubernetes is that you'll want the deployment to have an associated service (similar to exposing a port in Docker but will be necessary for pod to pod communication). This allows you to have pods (containers are created inside a pod) to talk to each other (Orborus pod to talk to pod with the Shuffle app). A deployment will create pod/s and a container then within that pod/s. Kubernetes will schedule the pod/s on an available node automatically. Rarely will you have multiple containers inside the one pod, normally one container per pod. Init containers can be super useful though such as this example: https://stackoverflow.com/questions/51079849/kubernetes-wait-for-other-pod-to-be-ready

https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#what-can-init-containers-be-used-for

The service also creates a load balancer, assuming not configured as nodeport or ingress

Jan 13 '23 13:01 beejaygee

Hi! Right now, we are deploying the whole Shuffle in Kubernetes, just had to migrate the docker-compose to K8s manifests. In concrete, all services as Deployments without persistence, except the Backend which is a Statefulset to keep shuffle-files. Right now, we are setting an external DOCKER_HOST to act as Docker Engine. If workers are intended to be runned as Batch Jobs without persistence, it makes sense to use the K8s Jobs API to deploy Workers. As @beejaygee suggests it seems that just migrating the current Docker client to k8s client-go would make the treat, deploying workers as Jobs, which in fact are just containers, so part of the job is already done. I only have one question, how do workers report the results to the Backend? Is it done via the Backend REST API? Thanks!

Jan 13 '23 13:01 TheMatrix97

This then sounds exactly like how we've built the Docker system already, and just requires a K8s migration for the API connection w/auth.

Yep, that's what I figured. I was a little surprised to hear that this bit was taking you so long considering K8s does most of the heavy lifting for you. K8s is much nicer than Docker IMO :)

Vertical

This should help on vertical front: https://medium.com/infrastructure-adventures/vertical-pod-autoscaler-deep-dive-limitations-and-real-world-examples-9195f8422724

I'd suggest for testing you use something like AWS EKS and avoid testing on baremetal if you can because it's more tricky to set up such an environment (e.g automatic load balancer configuration won't work without setting up metallb). I'm happy to help out with testing, however.

Another tip for Kubernetes is that you'll want the deployment to have an associated service (similar to exposing a port in Docker but will be necessary for pod to pod communication). This allows you to have pods (containers are created inside a pod) to talk to each other (Orborus pod to talk to pod with the Shuffle app). A deployment will create pod/s and a container then within that pod/s. Kubernetes will schedule the pod/s on an available node automatically. Rarely will you have multiple containers inside the one pod, normally one container per pod. Init containers can be super useful though such as this example: https://stackoverflow.com/questions/51079849/kubernetes-wait-for-other-pod-to-be-ready

https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#what-can-init-containers-be-used-for

The service also creates a load balancer, assuming not configured as nodeport or ingress

I can indeed confirm that it's not taking long because it's hard, but rather because we have higher priorities that need sorting first :)

Jan 13 '23 13:01 frikky

Right now, we are setting an external DOCKER_HOST to act as Docker Engine. If workers are intended to be runned as Batch Jobs without persistence, it makes sense to use the K8s Jobs API to deploy Workers. As @beejaygee suggests it seems that just migrating the current Docker client to k8s client-go would make the treat, deploying workers as Jobs, which in fact are just containers, so part of the job is already done. I only have one question, how do workers report the results to the Backend? Is it done via the Backend REST API?

You're right that this is pretty much how it's set up (with some small complexities on top). It does indeed use the backend API to send data back when a workflow is finished.

Jan 13 '23 13:01 frikky

I can indeed confirm that it's not taking long because it's hard, but rather because we have higher priorities that need sorting first :)

No worries. When you're ready, I'm happy to help with testing or provide further advice with regards to Kubernetes.

Jan 13 '23 13:01 beejaygee

Now testing out Kaniko to manage container building without Docker context for K8s.

Sep 06 '23 09:09 frikky

Good afternoon, is there any news on development?

Oct 20 '23 09:10 liam-star-black-master

Good afternoon, is there any news on development?

Yep! Testing has been ongoing for the last week or so. Please try it out and give us some feedback!

@beejaygee as well :)

https://github.com/Shuffle/Shuffle/tree/1.3.0/functions/kubernetes

Oct 20 '23 11:10 frikky

@frikky I noticed that the all-in-one.yaml manifest is missing a service account. The service account should be referenced in the deployment that requires it via spec.serviceAccountName, otherwise the token doesn't get mounted to the pod. We're you able to successfully run a workflow in k8s? I created a docker registry and the worker is throwing errors stating that that docker.socket isnt mounted.

Jan 08 '24 21:01 pleasantencounter

Does anyone have an update on this? We are running into spawning pods on k8s environment now but the instructions are not quite clear (yet)... 😄

Jul 12 '24 12:07 remyili89