gpu-manager Pod ignores limits.

Hello! I have launched the gpu-manager daemon set on a node. Then, I started a pod on this node which requested tencent.com/vcuda-memory:2. As I understand from the README, 1 vcuda memory request equals 256 MiB. Therefore, I expected that the process inside the image would be limited to using 512 MiB. However, it uses 1500 MiB, as if there are no limits at all. I thought that maybe I need to use https://github.com/tkestack/vcuda-controller in some way. But when I patched thomassong/gpu-manager:1.1.4 with vcuda-controller ./build-img.sh, the final image just exists with code 0 when I try to run it. I really don't understand how to use this whole thing.

I have been searching for a normal Kubernetes solution for a long time, which would make it possible to limit GPU core and memory in the same way as CPU and host memory. On paper, this solution looks exactly like what I have been searching for. Unfortunately, I can't get it to work. If somebody may help me, and maybe have the patience to contact me personally, I would be in debt.

Mar 02 '23 06:03 valafon

in my case, if process running exceeds the limits, is will return CUDA Out of Memory, but vcuda-core won't

May 31 '23 09:05 DennisYoung96

I have encountered the same issue as you. Can I leave a contact method?

Jan 18 '24 06:01 yangcheng-dev