coldzerofear

Results 9 comments of coldzerofear

/assign @shinytang6

> > volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/vgpu.(*GPUDevices).GetStatus(0xc0007173c0?) > > /go/src/volcano.sh/volcano/pkg/scheduler/api/devices/nvidia/vgpu/metrics.go:71 +0x18 > > The logs show that it panic here, we should also assert them. The assertions have been resolved on the outer layer,...

You can use the environment variable `LIBCUDA_LOG_LEVEL` to increase the logging level of the hami core and obtain more context

> 在将`LIBCUDA_LOG_LEVEL``4` > > ``` > (base) (⎈|N/A:N/A)➜ cat output.txt | grep -i error > [HAMI-core Debug(492:140563747359616:hook.c:293)]: loading nvmlErrorString:2 > [HAMI-core Debug(492:140563747359616:hook.c:293)]: loading nvmlDeviceClearEccErrorCounts:10 > [HAMI-core Debug(492:140563747359616:hook.c:293)]: loading nvmlDeviceGetDetailedEccErrors:38 >...