Is there any way in the meantime to request more than 1 replica from each GPU in my node
I have started MPS and used 10 as the division factor, but in our application scenario, we might directly allocate 2 whole GPUs, which is equivalent to specifying nvidia.com/gpu: 20. If I set nvidia.com/gpu > 1, I encounter the error: ‘request for “nvidia.com/gpu”: invalid request: maximum request size for shared resources is 1; found 10, which is unexpected’.
Is there any way in the meantime to request more than 1 replica from each GPU in my node?
any one has idea?
No, requesting multiple replicas does not give you more access to a shared GPU.
No, requesting multiple replicas does not give you more access to a shared GPU.
Isn't that only a restriction for the timeSlicing? For MPS, we should be able to assign more than one to the container POD. Applications such as triton-server should be able to use multiple GPUs.
Based on this example, you can bump up the limit to 2:
5. Update the manifest to request 2 nvidia.com/gpu:
resources:
limits:
nvidia.com/gpu: 2
Based on above doc, the pod will still see one GPU, but that has double the memory, etc, so it is 2x powerful as 1 MPS.
Is there any progress on this issue?
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed. To skip these checks, apply the "lifecycle/frozen" label.
This issue was automatically closed due to inactivity.