CHAI icon indicating copy to clipboard operation
CHAI copied to clipboard

README instructions outdated (?), build fails with CUDA enabled

Open samuelpmishLLNL opened this issue 4 years ago • 5 comments

On an Ubuntu 20.04 machine, with cuda 11.4 and g++ 9.3, I follow the instructions on the README:

$ git clone [email protected]:LLNL/CHAI.git
...
$ cd CHAI
$ git submodule update --init --recursive
...
$ mkdir build && cd build
$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda ../
...
-- CUDA Support is Off 
...
CMake Warning:
  Manually-specified variables were not used by the project:

    CUDA_TOOLKIT_ROOT_DIR

So, it seems the toolkit directory is being ignored and not actually enabling cuda (?). If we force cuda to be enabled, cmake configures as one would expect, but the library itself fails to build:

$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DENABLE_CUDA=ON ../
...
-- CUDA Support is ON
...
-- Configuring done
-- Generating done
-- Build files have been written to: ...
$ make -j
[  0%] Building CXX object blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[  0%] Building CXX object blt/tests/smoke/CMakeFiles/blt_cuda_version_smoke.dir/blt_cuda_version_smoke.cpp.o
[  3%] Building CUDA object blt/tests/smoke/CMakeFiles/blt_cuda_smoke.dir/blt_cuda_smoke.cpp.o
...
[ 94%] Linking CUDA device code CMakeFiles/chai-example.exe.dir/cmake_device_link.o
/usr/bin/ld: ../lib/libumpire.a(Allocator.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8fd_00000000-6_Allocator.cudafe1.cpp:(.text+0xee3): undefined reference to `__cudaRegisterLinkedBinary_44_tmpxft_0001b8fd_00000000_7_Allocator_cpp1_ii_a17095a1'
/usr/bin/ld: ../lib/libumpire.a(Replay.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8fe_00000000-6_Replay.cudafe1.cpp:(.text+0x6fb): undefined reference to `__cudaRegisterLinkedBinary_41_tmpxft_0001b8fe_00000000_7_Replay_cpp1_ii_5eca6429'
/usr/bin/ld: ../lib/libumpire.a(ResourceManager.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8f9_00000000-6_ResourceManager.cudafe1.cpp:(.text+0xe1a3): undefined reference to `__cudaRegisterLinkedBinary_50_tmpxft_0001b8f9_00000000_7_ResourceManager_cpp1_ii_42a9a1b2'

There are many more errors like this.

samuelpmishLLNL avatar Oct 19 '21 19:10 samuelpmishLLNL

What version of CMake are you using?

davidbeckingsale avatar Oct 19 '21 22:10 davidbeckingsale

What version of CMake are you using?

3.20.1

samuelpmishLLNL avatar Oct 20 '21 00:10 samuelpmishLLNL

Okay, so there are two things here - the ENABLE_CUDA option is required (but at one point in time was the default, so wasn't needed). For the build errors, I'm not sure. We have a build configuration in CI very similar to what you describe and it's working fine: https://github.com/LLNL/CHAI/blob/develop/Dockerfile#L54

davidbeckingsale avatar Oct 20 '21 15:10 davidbeckingsale

Okay, so there are two things here - the ENABLE_CUDA option is required (but at one point in time was the default, so wasn't needed)

Then can you please modify the main README to provide up-to-date instructions on how to build?

For the build errors, I'm not sure. We have a build configuration in CI very similar to what you describe and it's working fine

Our build of CHAI was broken when installing through spack, so I tried compiling manually and both cases produced the same errors indicated above.

FROM axom/compilers:nvcc-10 AS nvcc

Perhaps it's worth testing against the most recent major release of cuda (v10.0 is ~3 years old)

samuelpmishLLNL avatar Oct 20 '21 16:10 samuelpmishLLNL

Update: when configuring with an old version of CMake (3.14), CHAI does build without error:

$ /path/to/cmake-3.14.0/bin/cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DENABLE_CUDA=ON ../
...
$ make 
...
[100%] Built target managed_ptr_tests
[100%] Linking CUDA device code CMakeFiles/managed_array_tests.dir/cmake_device_link.o
[100%] Linking CXX executable ../../bin/managed_array_tests
[100%] Built target managed_array_tests
[100%] Linking CUDA device code CMakeFiles/primary_pool_tests.dir/cmake_device_link.o
[100%] Linking CXX executable ../../../../../bin/primary_pool_tests
[100%] Built target primary_pool_tests
$

Perhaps the discrepancy is related to some of the recent changes to CMake's built-in support for CUDA. It would be good if CHAI could discover which versions of CMake it does support, and indicate that on the README / documentation.

samuelpmishLLNL avatar Oct 20 '21 17:10 samuelpmishLLNL