k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

helm starts two daemon sets with the same matchLabel

Open peterzandbergen opened this issue 3 months ago • 0 comments

When I run helm with this values file, I get two daemon sets that try to control the same pods, because the selector.matchLabel is identical.

I suspect that this is a bug.

I use the latest version of the helm chart.

Values

# -----------------------------------------------------------------
# --- THIS IS THE CORRECT CONFIGURATION FOR MPS ---
#
# This block creates the configuration file.
config:
  map:
    # You can name this entry anything, e.g., "mps-config"
    mps-config: |-
      version: v1
      sharing:
        mps:
          resources:
          - name: nvidia.com/gpu
            replicas: 2 # Starting with 2 as a safe value
  
  # This tells the plugin to use the config block you just defined.
  default: "mps-config"

  # This block gives the MPS daemon the host permissions it needs.
  mps:
    enableHostPID: true
    # -----------------------------------------------------------------    
    
gdf:
  enabled: true

affinity: null

runtimeClassName: nvidia

nodeSelector:
  nvidia.com/gpu: "true"

tolerations:
  # Chart's default
  - key: CriticalAddonsOnly
    operator: Exists
  # Chart's default
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
  # The missing toleration for your control-plane GPU node
  - key: node-role.kubernetes.io/control-plane
    operator: Exists
    effect: NoSchedule

The pods that these values start:

kubectl get pods -n nvidia --show-labels -o wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP           NODE                NOMINATED NODE   READINESS GATES   LABELS
nvidia-device-plugin-mps-control-daemon-fjcms   2/2     Running   0          29m   10.244.5.6   k8s-cluster-w-01    <none>           <none>            app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=fb6d478fd,pod-template-generation=3
nvidia-device-plugin-mps-control-daemon-kzq9b   2/2     Running   0          29m   10.244.4.5   k8s-cluster-cp-03   <none>           <none>            app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=fb6d478fd,pod-template-generation=3
nvidia-device-plugin-ns4lj                      2/2     Running   0          29m   10.244.5.5   k8s-cluster-w-01    <none>           <none>            app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=574f484c6d,pod-template-generation=3
nvidia-device-plugin-t5q5f                      2/2     Running   0          29m   10.244.4.4   k8s-cluster-cp-03   <none>           <none>            app.kubernetes.io/instance=nvidia-device-plugin,app.kubernetes.io/name=nvidia-device-plugin,controller-revision-hash=574f484c6d,pod-template-generation=3

peterzandbergen avatar Oct 28 '25 17:10 peterzandbergen