capsule icon indicating copy to clipboard operation
capsule copied to clipboard

Allow specifying multiple nodeSelectors on a tenant

Open slushysnowman opened this issue 3 years ago • 11 comments

Describe the feature

We have a use-case where tenants can use 2 different node pools.

  1. A shared node pool available for use by all tenants on a cluster
  2. A private node pool only available for use by a specific tenant or group of tenants

The current nodeSelector specification on a tenant does not allow this use case (apart from some very creative use of labels on nodes) - currently we use Kyverno to enforce this, but it seems like Capsule would be an ideal place for this sort of policy.

It would be good if there was a mechanism on a tenant to be able to specify multiple node selectors, to allow tenants to use any number of separate node pools

What would the new user story look like?

When a tenant is created, multiple nodeSelectors can be specified

  • When a tenant user attempts to create a pod with a nodeSelector that matches one of the ones specified on the tenant -> pod scheduled on that node pool
  • When a tenant user attempts to create a pod with a nodeSelector that does not match one of the ones specified on the tenant -> request is denied
  1. What are the prerequisites for this?
  2. Tenant owner creates a new Namespace
  3. This is going to be attached to the Tenant
  4. All the magic happens in the background

Feel free to add a diagram if that helps explain things.

Expected behavior

A clear and concise description of what you expect to happen.

slushysnowman avatar Feb 28 '22 12:02 slushysnowman

Hi @slushysnowman. As i remember, node selectors doesn't support OR operation (https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector), so it will be impossible to achieve. What you can do is to use taints and tolerations on this nodepools or allow user to create 2 different tenants - one using the private pool, another using shared

MaxFedotov avatar Feb 28 '22 12:02 MaxFedotov

It depends how Capsule checks I guess?

For example, if Capsule verifies the nodeSelectors on the pod spec against the multiple nodeSelectors specified on the tenant that should work right?

For example:

  1. Pod going to be deployed with nodeSelector specified
  2. Capsule assesses nodeSelector on pod
  3. Pod nodeSelector matches one of the nodeSelectors on tenant spec
  4. Pod is deployed

Or am I looking at it too simplistically?

slushysnowman avatar Feb 28 '22 21:02 slushysnowman

Capsule uses built-in PodNodeSelector admission controller (https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#configuration-annotation-format). So basically whatever you add as nodeSelector to your TenantSpec would be added to this annotation and all magic behind the scene is done by Kubernetes itself.

MaxFedotov avatar Feb 28 '22 21:02 MaxFedotov

Yeah it makes sense that continuing to do it this way would make this feature request impossible - but potentially there are other ways that this functionality could be implemented.

If it's not desired to add this to Capsule, that's all good - currently we're using Kyverno for this, as Capsule's nodeSelector option wasn't fit for our purposes - we'd potentially like to swap to using Capsule for this, as it would probably be simpler to maintain, and seems like something that more users might run into in the long run, but I can understand if it's felt like this is not desired/needed functionality.

slushysnowman avatar Feb 28 '22 21:02 slushysnowman

@slushysnowman please, could you share the Kyverno rule you're using to allow a Pod running on one or more eligible node pools?

The idea is to evaluate their logic and check out if it's feasible translating it to Capsule.

prometherion avatar Mar 02 '22 09:03 prometherion

@prometherion apologies for delay, here it is:

---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: node-restrictions
  annotations:
    policies.kyverno.io/title: Node restrictions
    policies.kyverno.io/category: xyz
    policies.kyverno.io/severity: high
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/description: >-
      We offers various nodes to deploy applications, these are for example labeled:
      `xyz: platform`, `xyz: shared`, `xyz: <tenant>-<purpose> for platform, shared
      and tenant specific nodes.
      `platform` nodes are designated for the platform services and are therefore not available to users.
      So the usage of the nodeSelector `xyz: platform` is disallowed.
      Subsequently Tenant nodes `xyz: <tenant>-<purpose>` are only available to their designated tenant
      By default if no nodeSelector is specified, shared is implied
spec:
  validationFailureAction: enforce
  background: true
  rules:
    - name: default-shared-nodeselector
      match:
        resources:
          kinds:
            - Pod
      mutate:
        patchStrategicMerge:
          spec:
            nodeSelector:
              +(xyz): shared
      exclude:
        #  Exempt infrastructure related namespaces in having a default node-selector assigned
        resources:
          namespaces:
            - ns1
            - ns2
    - name: disallow-non-tenant-nodes
      match:
        resources:
          kinds:
            - Pod
      preconditions:
        - key: "{{ request.object.spec.nodeSelector.xyz }}"
          operator: NotEquals
          value: "shared"
      context:
        - name: tenant # we derive our tenant from the namespace capsule created for our tenant
          apiCall:
            urlPath: "/api/v1/namespaces/{{ request.namespace }}"
            jmesPath: 'metadata.labels."capsule.clastix.io/tenant"'
      validate:
        message: "{{ tenant }} is not allowed to deploy using nodeSelector xyz: {{ request.object.spec.nodeSelector.xyz }} "
        deny:
          conditions:
            - key: "{{ request.object.spec.nodeSelector.xyz }}"
              operator: NotEquals
              value: "{{ tenant }}-*"
      exclude:
        #  Exempt infrastructure related namespaces from targeting tenant nodes
        resources:
          namespaces:
            - ns1
            - ns2
    - name: disallow-platform-nodes
      match:
        all:
          - resources:
              kinds:
                - Pod
      validate:
        message: "For hosting pods on xyz, a nodeSelectorLabel with the key `xyz` is required, but the value cannot be equal to `platform`. To host it on shared nodes use the value `shared`"
        pattern:
          spec:
            nodeSelector:
              xyz: "!platform"
      exclude:
        #  Exempt infrastructure related namespaces from targeting platform nodes
        resources:
          namespaces:
            - ns1
            - ns2

slushysnowman avatar Mar 14 '22 08:03 slushysnowman

I'm not a Kyverno expert, please correct me if I'm wrong.

Essentially, this policy is creating a validation pipeline made up of 3 steps.

  1. with the stage default-shared-nodeselector you're ​providing a default value in case the Pod doesn't have it.
  2. with the stage disallow-non-tenant-nodes you're validating only the Pods that got a value different from shared, retrieving the Tenant and checking if the user-provided nodeSelector is matching your criteria
  3. the final stage disallow-platform-nodes ensures that Pods are not scheduled on the platform nodes, I'd say the ones used by the platform itself, hosting critical components, such as API Server, Capsule, or any other backing service.

prometherion avatar Mar 16 '22 10:03 prometherion

@MaxFedotov I'd like to offer this enhancement to Capsule, overcoming the limitations of the PodNodeSelector that is not enabled by default, so we can decrease the number of add-ons required to run Capsule properly.

My idea is to create a new handler in the Pods webhook, iterating over the conditions of the enhanced Tenant node selectors until a matching one is found by the scheduler: do you see any drawback to this approach?

In the end, we would mimic the same behavior offered by Kyverno.

prometherion avatar Apr 12 '22 12:04 prometherion

@prometherion Do we still want to consider this implementation? if we want to add this we could deliver it with 0.5.0. I can do the PR.

oliverbaehler avatar Nov 29 '23 13:11 oliverbaehler

Removing the need for PodNodeSelector would be great. It's not available on EKS or other managed providers https://github.com/aws/containers-roadmap/issues/304

carpenterm avatar Dec 11 '23 13:12 carpenterm