DCGM icon indicating copy to clipboard operation
DCGM copied to clipboard

Bundled CUDA libraries

Open zzzoom opened this issue 2 years ago • 1 comments

Installing dcgm on a stateless node is untenable at the moment, because the dcgm package is a 1.5GB behemoth of which ~1GB are different versions of CUBLAS and ~240MB are different versions of CURAND. Appropiate versions of both libraries are usually present somewhere else in the system as they are pretty much essential for CUDA applications.

Please consider packaging those libraries separately and letting us point DCGM to our own location for those libraries.

zzzoom avatar Jul 19 '23 05:07 zzzoom

The situation did not improve in the latest releases, it got worse. I think this is a very important issue for diskless systems, and would be great if minimizing the installation footprint of DCGM gets some traction.

damianam avatar Feb 19 '25 13:02 damianam

I guess we'll have to go with gpud.

zzzoom avatar Apr 09 '25 09:04 zzzoom

The latest DCGM packages are split into multiple packages. Please see the release notes here.

The parts that require cublas are not in the -cuda11/12 packages, and those are not necessary if you only need monitoring functionality of the DCGM and do not need diagnostics.

nikkon-dev avatar Apr 12 '25 02:04 nikkon-dev

From my POV this is not solved. The latest DCGM is split in CUDA 11 and CUDA 12 packages. But installing just the CUDA 12 packages take more than 1.2 GB. Just libdcgm_cublas_proxy12.so.4.2.2 takes already 765 MB. And the reason for that is the number of different architectures supported by the same package:

$ cuobjdump libdcgm_cublas_proxy12.so.4.2.2  | grep arch | sort | uniq -c | sort -V -k4
    104 arch = sm_50
      1 arch = sm_52
    105 arch = sm_60
     91 arch = sm_61
    161 arch = sm_70
    109 arch = sm_75
    305 arch = sm_80
    110 arch = sm_86
     58 arch = sm_89
    391 arch = sm_90
     83 arch = sm_90a
   1684 arch = sm_100
    575 arch = sm_120
     21 arch = sm_120a

To me it would make a lot of sense to split that package not just depending on CUDA versions, but also on architectures. That would slim down the size quite a bit, which is very problematic here for diskless nodes.

damianam avatar Apr 14 '25 07:04 damianam

@damianam, Unfortunately, the DCGM team is unable to divide Cublas into smaller components. We have provided feedback to the Cublas library team; however, there are currently no plans to separate it into smaller parts for each architecture.

nikkon-dev avatar Apr 14 '25 18:04 nikkon-dev