clusterdata
clusterdata copied to clipboard
GPU Memory 'max_gpu_wrk_mem' seems to be more than the actual GPU type in GPU'20 trace ?
For example in GPU 2020 trace
the job 'e5d6d5b546bff61f93b47ebf' has max_gpu_wrk_mem '44.289062' but the gpu type is V100 where the memory capacity should be 16GB or at max 32GB !?
Should I assume that once the max_gpu_wrk_mem > GPU_type_capcity, the worker encounters OOM ?