weave-gitops Write up k8s Client

Can we get a write up about the current k8s client we are using in our codebase, the challenges it is bringing, as well as alternatives we can use to alleviate our namespace headaches.

Aug 18 '22 13:08 JamWils

There's 2 clients for k8s in go:

The one simply called client-go
The one from controller-runtime.

controller-runtime is ultimately a wrapper around client-go, which exposes a slightly reduced surface area, but does so in a way that lets it throw in bells and whistles that are good for writing controllers. This is hand-wavy, but I can't see controller-runtime's documentation explain in any more depth exactly what it wants to do different - the client is a part of kubebuilder, so I guess it has the purpose of serving all kubebuilder's client needs.

A controller is a long lived process that is constantly looking at approximately the same dataset, whereas "not a controller" is often shorter lived, and can query more or less arbitrary data. As an example, controller-runtime throws in free, transparent caching when the connection is reused.

Since weave gitops runs queries as the user that sent each request, weave gitops' connections are short lived, and query more and more kinds of data. Because the connection is set up per request, we do not get any caching.

The choice to use controller-runtime causes several downsides:

Cross-namespace querying - there's no restriction in etcd or k8s in being able to run a query across all namespaces, but controller-runtime does not allow that - I'm going to guess it has to do with the caching, but I haven't found anything definitive. We have a lot of complex and frankly quite scary code to cache namespaces and do lots and lots of requests, which wouldn't be necessary if we just used client-go.
The discovery API from client-go is not exported in controller-runtime, so we can't get e.g. the currently running version of k8s without using client-go. So we already have to use both clients.
The fact that they are different clients with a similar API makes it harder to get started and up to speed - the stackoverflow answer to "how do I do $x" always assume client-go, and you have to work out how to port that bit of information to controller-runtime.

On the other hand:

It's not uncommon for us to share code with controller code - for example, flux. I believe APIs won't be compatible with both.
There's plenty of other controller code in the gitops ecosystem that isn't currently used in gitops core, but is used in e.g. gitops enterprise. Much code would need to be implemented twice to work with both.

The APIs are sufficiently different that moving from one to the other isn't simply a matter of swapping the import - I haven't tried, but my hunch from reading documentation is that the difference is somewhere inbetween "I can do that with sed" and "more or less a complete rewrite".

I would suggest a spike to see how much would need to change if we were to change to client-go.

Aug 19 '22 10:08 ozamosi

I'm going to guess it has to do with the caching,

Correct. It uses the namespace-based informers to watch and cache frequently-used objects.

Aug 19 '22 11:08 chanwit

I would suggest a spike to see how much would need to change if we were to change to client-go.

I would also suggest trying "dynamic-client." Its APIs return unstructured objects and unstructured lists, so it's likely to be somewhat compatible with our current code bases.

Here's an example:

	pods, err := dynClient.Resource(
		schema.GroupVersionResource{Group: "", Version: "v1", Resource: "pods"},
	).Namespace("").List(context.TODO(), metav1.ListOptions{
		LabelSelector: "app=kustomize-controller",
	})

Aug 19 '22 11:08 chanwit

Is that this one? https://pkg.go.dev/k8s.io/client-go/dynamic Or is it another one that's not part of client-go?

Aug 19 '22 11:08 ozamosi

Yes, that one.

Aug 19 '22 11:08 chanwit

This doesn't apply when a user doesn't have cluster-level permissions. K8s doesn't do per-namespace filtering when you make a request for "all namespaces" - it simply checks if you have cluster level permissions. As a result, we can't assume we can do one request for all namespaces, unless we check your permissions first.

Oct 13 '22 09:10 ozamosi