Prakash Chandra
Prakash Chandra
Hi Team Why are my pods getting multiprocessor count as 4 instead of 40? issue observed after upgrading our Amazon EKS cluster to version 1.26. We are utilizing NVIDIA Tesla...
@elezar I reduced the count to 2 **sharing: mps: resources: - name: nvidia.com/gpu replicas: 2** Then I am getting the count as 20  But I want SM to be...
@elezar I want to run multiple pods(approx 8) on 1 GPU. So I am using MPS for that purpose. I understand your answer, that will provide me with 40 SM...
@klueska I want full memory and compute access across all pods. I have g4dn.2xlarge instance with the following config  I want my 8 workloads to access the memory as...
@elezar Could you please help here. I am not able to configure the MPS sharing option here is the output kubectl logs nvidia-device-plugin-daemonset-4p742 -c mps-control-daemon-ctr -n kube-system I0517 04:07:02.596152 1...
@channel Could anyone please give some advice here?
I am also facing the same issue where the logs are in error state. When I change the tag to latest for the dcgm image nvcr.io/nvidia/k8s/dcgm-exporter:latest , I see the...
@elezar I am using version `0.15.0` I need to set replicas to 1 so that I can have full resource access of the GPU node. My config looks like this...
@klueska I provisioned an Optimised EKS GPU node g4dn.2xlarge with 1 GPU, configuration as follows  In order to have my workloads/pods get scheduled over it, I created the daemonset...
@elezar @klueska Although thing didn't work from Helm configuration I was able to figure out the solution. I tweaked the value for `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE` to 100 so that my full GPU...