dcgm nvlink metrics not available on dcgm 3.1.3
the nvidia-dcgm doc says that metrics like DCGM_FI_PROF_NVLINK_L{id}_TX_BYTES should be avaible on dcgm 3.1
I'm getting the following error when trying to query them (from dcgmi 3.1.3):
$ dcgmi dmon -d 100 -e 1040
#Entity NVL0T
ID
Error setting watches. Result: -6: Feature not supported
$ dcgmi -v | grep Version
Version : 3.1.3
Version : 3.1.3
is it expected? am I missing intermediate steps to enable the metrics?
@luccabb what is the output of nvidia-smi? What GPU generation are you using?
@dbeer
What GPU generation are you using?
NVIDIA A100-SXM4-40GB
what is the output of nvidia-smi?
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0
...
per https://github.com/NVIDIA/DCGM/issues/149#issuecomment-1922398817 its only available on Hopper+ GPUs
surfacing this on the dcgm docs would be helpful
cc: @dbeer @nikkon-dev