k8s-device-plugin Is there any way in the meantime to request more than 1 replica from each GPU in my node

I have started MPS and used 10 as the division factor, but in our application scenario, we might directly allocate 2 whole GPUs, which is equivalent to specifying nvidia.com/gpu: 20. If I set nvidia.com/gpu > 1, I encounter the error: ‘request for “nvidia.com/gpu”: invalid request: maximum request size for shared resources is 1; found 10, which is unexpected’.

Is there any way in the meantime to request more than 1 replica from each GPU in my node?

Mar 18 '25 08:03 arthas3014

any one has idea?

Mar 18 '25 09:03 arthas3014

No, requesting multiple replicas does not give you more access to a shared GPU.

Mar 25 '25 09:03 chipzoller

No, requesting multiple replicas does not give you more access to a shared GPU.

Isn't that only a restriction for the timeSlicing? For MPS, we should be able to assign more than one to the container POD. Applications such as triton-server should be able to use multiple GPUs.

Based on this example, you can bump up the limit to 2:

5. Update the manifest to request 2 nvidia.com/gpu:


  resources:
        limits:
          nvidia.com/gpu: 2

Based on above doc, the pod will still see one GPU, but that has double the memory, etc, so it is 2x powerful as 1 MPS.

Jun 20 '25 21:06 gfrankliu

Is there any progress on this issue?

Sep 10 '25 07:09 ben-wangz

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.

Dec 10 '25 04:12 github-actions[bot]

This issue was automatically closed due to inactivity.

Jan 09 '26 04:01 github-actions[bot]