Allow specifying multiple nodeSelectors on a tenant
Describe the feature
We have a use-case where tenants can use 2 different node pools.
- A shared node pool available for use by all tenants on a cluster
- A private node pool only available for use by a specific tenant or group of tenants
The current nodeSelector specification on a tenant does not allow this use case (apart from some very creative use of labels on nodes) - currently we use Kyverno to enforce this, but it seems like Capsule would be an ideal place for this sort of policy.
It would be good if there was a mechanism on a tenant to be able to specify multiple node selectors, to allow tenants to use any number of separate node pools
What would the new user story look like?
When a tenant is created, multiple nodeSelectors can be specified
- When a tenant user attempts to create a pod with a nodeSelector that matches one of the ones specified on the tenant -> pod scheduled on that node pool
- When a tenant user attempts to create a pod with a nodeSelector that does not match one of the ones specified on the tenant -> request is denied
- What are the prerequisites for this?
- Tenant owner creates a new Namespace
- This is going to be attached to the Tenant
- All the magic happens in the background
Feel free to add a diagram if that helps explain things.
Expected behavior
A clear and concise description of what you expect to happen.
Hi @slushysnowman. As i remember, node selectors doesn't support OR operation (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector), so it will be impossible to achieve. What you can do is to use taints and tolerations on this nodepools or allow user to create 2 different tenants - one using the private pool, another using shared
It depends how Capsule checks I guess?
For example, if Capsule verifies the nodeSelectors on the pod spec against the multiple nodeSelectors specified on the tenant that should work right?
For example:
- Pod going to be deployed with nodeSelector specified
- Capsule assesses nodeSelector on pod
- Pod nodeSelector matches one of the nodeSelectors on tenant spec
- Pod is deployed
Or am I looking at it too simplistically?
Capsule uses built-in PodNodeSelector admission controller (https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#configuration-annotation-format). So basically whatever you add as nodeSelector to your TenantSpec would be added to this annotation and all magic behind the scene is done by Kubernetes itself.
Yeah it makes sense that continuing to do it this way would make this feature request impossible - but potentially there are other ways that this functionality could be implemented.
If it's not desired to add this to Capsule, that's all good - currently we're using Kyverno for this, as Capsule's nodeSelector option wasn't fit for our purposes - we'd potentially like to swap to using Capsule for this, as it would probably be simpler to maintain, and seems like something that more users might run into in the long run, but I can understand if it's felt like this is not desired/needed functionality.
@slushysnowman please, could you share the Kyverno rule you're using to allow a Pod running on one or more eligible node pools?
The idea is to evaluate their logic and check out if it's feasible translating it to Capsule.
@prometherion apologies for delay, here it is:
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: node-restrictions
annotations:
policies.kyverno.io/title: Node restrictions
policies.kyverno.io/category: xyz
policies.kyverno.io/severity: high
policies.kyverno.io/subject: Pod
policies.kyverno.io/description: >-
We offers various nodes to deploy applications, these are for example labeled:
`xyz: platform`, `xyz: shared`, `xyz: <tenant>-<purpose> for platform, shared
and tenant specific nodes.
`platform` nodes are designated for the platform services and are therefore not available to users.
So the usage of the nodeSelector `xyz: platform` is disallowed.
Subsequently Tenant nodes `xyz: <tenant>-<purpose>` are only available to their designated tenant
By default if no nodeSelector is specified, shared is implied
spec:
validationFailureAction: enforce
background: true
rules:
- name: default-shared-nodeselector
match:
resources:
kinds:
- Pod
mutate:
patchStrategicMerge:
spec:
nodeSelector:
+(xyz): shared
exclude:
# Exempt infrastructure related namespaces in having a default node-selector assigned
resources:
namespaces:
- ns1
- ns2
- name: disallow-non-tenant-nodes
match:
resources:
kinds:
- Pod
preconditions:
- key: "{{ request.object.spec.nodeSelector.xyz }}"
operator: NotEquals
value: "shared"
context:
- name: tenant # we derive our tenant from the namespace capsule created for our tenant
apiCall:
urlPath: "/api/v1/namespaces/{{ request.namespace }}"
jmesPath: 'metadata.labels."capsule.clastix.io/tenant"'
validate:
message: "{{ tenant }} is not allowed to deploy using nodeSelector xyz: {{ request.object.spec.nodeSelector.xyz }} "
deny:
conditions:
- key: "{{ request.object.spec.nodeSelector.xyz }}"
operator: NotEquals
value: "{{ tenant }}-*"
exclude:
# Exempt infrastructure related namespaces from targeting tenant nodes
resources:
namespaces:
- ns1
- ns2
- name: disallow-platform-nodes
match:
all:
- resources:
kinds:
- Pod
validate:
message: "For hosting pods on xyz, a nodeSelectorLabel with the key `xyz` is required, but the value cannot be equal to `platform`. To host it on shared nodes use the value `shared`"
pattern:
spec:
nodeSelector:
xyz: "!platform"
exclude:
# Exempt infrastructure related namespaces from targeting platform nodes
resources:
namespaces:
- ns1
- ns2
I'm not a Kyverno expert, please correct me if I'm wrong.
Essentially, this policy is creating a validation pipeline made up of 3 steps.
- with the stage
default-shared-nodeselectoryou're providing a default value in case the Pod doesn't have it. - with the stage
disallow-non-tenant-nodesyou're validating only the Pods that got a value different from shared, retrieving the Tenant and checking if the user-providednodeSelectoris matching your criteria - the final stage
disallow-platform-nodesensures that Pods are not scheduled on the platform nodes, I'd say the ones used by the platform itself, hosting critical components, such as API Server, Capsule, or any other backing service.
@MaxFedotov I'd like to offer this enhancement to Capsule, overcoming the limitations of the PodNodeSelector that is not enabled by default, so we can decrease the number of add-ons required to run Capsule properly.
My idea is to create a new handler in the Pods webhook, iterating over the conditions of the enhanced Tenant node selectors until a matching one is found by the scheduler: do you see any drawback to this approach?
In the end, we would mimic the same behavior offered by Kyverno.
@prometherion Do we still want to consider this implementation? if we want to add this we could deliver it with 0.5.0. I can do the PR.
Removing the need for PodNodeSelector would be great. It's not available on EKS or other managed providers https://github.com/aws/containers-roadmap/issues/304