gpushare-device-plugin icon indicating copy to clipboard operation
gpushare-device-plugin copied to clipboard

ResourceExhausted desc = grpc: received message larger than max (4986010 vs. 4194304)

Open k0nstantinv opened this issue 3 years ago • 0 comments

For such a GPU like NVIDIA A100 PCI-E 80GB it's not possible to update extended resource in Mb due to that error:

ResourceExhausted desc = grpc: received message larger than max (4986010 vs. 4194304)

device plugin can't update the node status and it leads to GPU node has zero gpu_memory capacity

Capacity:
aliyun.com/gpu_memory:         0
Nov 17 15:09:51 node02 kubelet[11218]: I1117 15:09:51.797475   11218 manager.go:440] Mark all resources Unhealthy for resource aliyun.com/gpu_memory

k0nstantinv avatar Nov 21 '22 09:11 k0nstantinv