gpushare-device-plugin
gpushare-device-plugin copied to clipboard
GPU Sharing Device Plugin for Kubernetes Cluster
Any chance to have the device plugin working on containerd without nvidia-docker2? I have rebuild my cluster with Conteinerd and on my worker nodes the following are installed libnvidia-container nvidia-container-toolkit...
Is there any OOM or Signal when pod uses more memory than desired? Since the physical memory on Gpu is limited, over using memory may affect other processes by other...
Can it take effect on the window node?with window container
这里这块代码会导致当节点有多个不同型号GPU(显存也不一致)时会以第一个识别到的GPU为准,例如节点12G +16G ,这个节点两个GPU会被都识别成12G,一共24G https://github.com/AliyunContainerService/gpushare-device-plugin/blob/45fb8b88692250cff2d53cb64b0a41864a5fcaf3/pkg/gpu/nvidia/nvidia.go#L70 @cheyang
1. upgrade golang version to 1.19 2. replace gopkg with go mod 3. update kubernetes sdk version 4. fix circle ci
For such a GPU like NVIDIA A100 PCI-E 80GB it's not possible to update extended resource in Mb due to that error: `ResourceExhausted desc = grpc: received message larger than...
**What happened:** trivy image scan lists critical and high vulnerability against latest image k8s-gpushare-plugin:v2-1.11-aff8a23 **What you expected to happen:** No critical or high vulnerability issues. **How to reproduce it:** trivy...
Hello, I'm trying to use gpushare device plugin only for exposing gpu_mem resource from k8s gpu node in MiB. I have all the NVIDIA things like drivers, nvidia-container-runtime etc. installed...
该程序在 k8s .1.25中无法使用 Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init:...
### Plugin cannot find my A100 80G I use Rancher 2.5.9 to build my cluster, I think the installation steps are correct since it worked on another cluster which I...