XinYuan
XinYuan
It is unverified. Could you help to confirm this issue?. And a known issue https://github.com/pokerfaceSad/GPUMounter/issues/19#issuecomment-1034134013 is that GPUMounter can not work well on k8s 1.20+.
Thanks for your feedback. I will try to fix it. PRs are also very welcomed!
@cool9203 Happy Spring Festival! Thanks for your efforts. Sorry for waiting so long time. * The checking of `_` is to handle the systemd cgroup driver. But if `_` can...
@cool9203 Thank you for revealing this! The reason why slave pod can't be created in owner pod namspace is #3. Maybe need some modifications to adpat k8s v1.20+.
@cool9203 The bug of constant `cgroup driver` has been fixed in https://github.com/pokerfaceSad/GPUMounter/commit/163ef7b10e7b53180033d1585c9e637c72b3b105. `cgroup driver` can be set in [/deploy/gpu-mounter-workers.yaml](https://github.com/pokerfaceSad/GPUMounter/blob/163ef7b10e7b53180033d1585c9e637c72b3b105/deploy/gpu-mounter-workers.yaml) by environment variable `CGROUP_DRIVER`.
@cool9203 In fact, slave pods were created in owner pod namespace before https://github.com/pokerfaceSad/GPUMounter/commit/a378e39793c241d40a80387eab11aa996c95cc93. However, in a multi-tenant cluster scenario, cluster administrator may use resourse quota feature to limit the resource...
@liuweibin6566396837 Thanks for your issue. Show more relevant logs of gpu-mounter-worker(`/etc/GPUMounter/log/GPUMounter-worker.log`) plz.
It seems like that you edit the k8s version in this issue. What's your k8s version? In current version, GPUMounter has a known bug on k8s v1.20+ mentioned in https://github.com/pokerfaceSad/GPUMounter/issues/19#issuecomment-1034134013.
Thx for your report. It seems that you have the unfixed issue mentioned in #19. GPUMounter can not work well in k8s v1.20+ in current version.
describe node看下GPU资源是否空闲