DCGM icon indicating copy to clipboard operation
DCGM copied to clipboard

how to run "memory bandwidth" test using nvvs/dcgmi, which is based on DCGM source code

Open ligeweiwu opened this issue 2 years ago • 5 comments

Hi I am building the DCGM source code and using nvvs/dcgmi to perform the diagnostic test. I see all plugintest and they are all in the format of .so. .But when I want to perform the "memory bandwidth" diagnostic, they give me an error:

./dcgmi diag -r "memory bandwidth" -g 2 Error: requested test "memory bandwidth" was not found among possible test choices.

In my case, all plugin.so are in the location of /username/DCGM/_out/Linux-amd64-debug/share/nvidia-validation-suite/plugins/cuda11, and there is no name of "memory bandwidth". And I also see the source code, actually i think it doesn't have the option name "memory bandwidth". It only has "memtest".

So please tell me how can I run "memory bandwith" using DCGM source code?

By the way, the memtest is OK ("./dcgmi diag -r memtest -g 2" works fine, and I also see the corresponding libMemtest.so in plugins/cuda11, and the source code has the option "memtest").

Thanks.

ligeweiwu avatar Feb 17 '23 14:02 ligeweiwu

Hi Ligeweiwu - the memory bandwidth test is unfortunately not yet releasable as open source. To run the test with open source, you can download a released version of DCGM that matches the open source you're building and copy the plugin libraries to your locally built plugins dir.

dbeer avatar Feb 17 '23 16:02 dbeer

@dbeer Thanks for your reply. I have another concept want to confirm. In plugin_src, memory. <-> -r3 test : GPU Memory memtest. <-> -r4 test: Memory Stress memory bandwidth <-> no source code, can only use the released version package Is that right?

Thanks

ligeweiwu avatar Feb 17 '23 18:02 ligeweiwu

That's correct, although the memory test should run with -r 2 and higher.

dbeer avatar Feb 17 '23 20:02 dbeer

@dbeer Hi dbeer Thanks for you reply. I am building DCGM source code based on the version 3.0.4 (commit version: f6fe5654b780873da528b84cb3d7de10d7abe0d1). But I can not find the corresponding download linking for this version. Could you tell me that where can I download the corresponding released package for this version ? Thanks.

ligeweiwu avatar Feb 19 '23 09:02 ligeweiwu

@ligeweiwu,

All package versions are available in the public Cuda repositories: Deb Rpm

nikkon-dev avatar Feb 19 '23 10:02 nikkon-dev