coldzerofear
coldzerofear
/assign @k82cn
/assign @shinytang6
/assign @shinytang6
/assign @lowang-bh
> > volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/vgpu.(*GPUDevices).GetStatus(0xc0007173c0?) > > /go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/vgpu/metrics.go:71 +0x18 > > The logs show that it panic here, we should also assert them. The assertions have been resolved on the outer layer,...
/assign @thor-wl
You can use the environment variable `LIBCUDA_LOG_LEVEL` to increase the logging level of the hami core and obtain more context
> 在将`LIBCUDA_LOG_LEVEL``4` > > ``` > (base) (⎈|N/A:N/A)➜ cat output.txt | grep -i error > [HAMI-core Debug(492:140563747359616:hook.c:293)]: loading nvmlErrorString:2 > [HAMI-core Debug(492:140563747359616:hook.c:293)]: loading nvmlDeviceClearEccErrorCounts:10 > [HAMI-core Debug(492:140563747359616:hook.c:293)]: loading nvmlDeviceGetDetailedEccErrors:38 >...
If there are only two physical GPUs on your node, a single container can only request a maximum of 2 VGPUs