DCGM icon indicating copy to clipboard operation
DCGM copied to clipboard

dcgm nvlink metrics not available on dcgm 3.1.3

Open luccabb opened this issue 2 years ago • 4 comments

the nvidia-dcgm doc says that metrics like DCGM_FI_PROF_NVLINK_L{id}_TX_BYTES should be avaible on dcgm 3.1

I'm getting the following error when trying to query them (from dcgmi 3.1.3):

$ dcgmi dmon -d 100 -e 1040
#Entity   NVL0T                       
ID                                    
Error setting watches. Result: -6: Feature not supported
$ dcgmi -v | grep Version
Version : 3.1.3
Version : 3.1.3

is it expected? am I missing intermediate steps to enable the metrics?

luccabb avatar Oct 25 '23 00:10 luccabb

@luccabb what is the output of nvidia-smi? What GPU generation are you using?

dbeer avatar Oct 25 '23 14:10 dbeer

@dbeer

What GPU generation are you using?

NVIDIA A100-SXM4-40GB

luccabb avatar Oct 25 '23 17:10 luccabb

what is the output of nvidia-smi?

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0 
...

luccabb avatar Oct 25 '23 18:10 luccabb

per https://github.com/NVIDIA/DCGM/issues/149#issuecomment-1922398817 its only available on Hopper+ GPUs

surfacing this on the dcgm docs would be helpful

cc: @dbeer @nikkon-dev

luccabb avatar Mar 20 '24 01:03 luccabb