README instructions outdated (?), build fails with CUDA enabled
On an Ubuntu 20.04 machine, with cuda 11.4 and g++ 9.3, I follow the instructions on the README:
$ git clone [email protected]:LLNL/CHAI.git
...
$ cd CHAI
$ git submodule update --init --recursive
...
$ mkdir build && cd build
$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda ../
...
-- CUDA Support is Off
...
CMake Warning:
Manually-specified variables were not used by the project:
CUDA_TOOLKIT_ROOT_DIR
So, it seems the toolkit directory is being ignored and not actually enabling cuda (?). If we force cuda to be enabled, cmake configures as one would expect, but the library itself fails to build:
$ cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DENABLE_CUDA=ON ../
...
-- CUDA Support is ON
...
-- Configuring done
-- Generating done
-- Build files have been written to: ...
$ make -j
[ 0%] Building CXX object blt/thirdparty_builtin/googletest-master-2020-01-07/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 0%] Building CXX object blt/tests/smoke/CMakeFiles/blt_cuda_version_smoke.dir/blt_cuda_version_smoke.cpp.o
[ 3%] Building CUDA object blt/tests/smoke/CMakeFiles/blt_cuda_smoke.dir/blt_cuda_smoke.cpp.o
...
[ 94%] Linking CUDA device code CMakeFiles/chai-example.exe.dir/cmake_device_link.o
/usr/bin/ld: ../lib/libumpire.a(Allocator.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8fd_00000000-6_Allocator.cudafe1.cpp:(.text+0xee3): undefined reference to `__cudaRegisterLinkedBinary_44_tmpxft_0001b8fd_00000000_7_Allocator_cpp1_ii_a17095a1'
/usr/bin/ld: ../lib/libumpire.a(Replay.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8fe_00000000-6_Replay.cudafe1.cpp:(.text+0x6fb): undefined reference to `__cudaRegisterLinkedBinary_41_tmpxft_0001b8fe_00000000_7_Replay_cpp1_ii_5eca6429'
/usr/bin/ld: ../lib/libumpire.a(ResourceManager.cpp.o): in function `__sti____cudaRegisterAll()':
tmpxft_0001b8f9_00000000-6_ResourceManager.cudafe1.cpp:(.text+0xe1a3): undefined reference to `__cudaRegisterLinkedBinary_50_tmpxft_0001b8f9_00000000_7_ResourceManager_cpp1_ii_42a9a1b2'
There are many more errors like this.
What version of CMake are you using?
What version of CMake are you using?
3.20.1
Okay, so there are two things here - the ENABLE_CUDA option is required (but at one point in time was the default, so wasn't needed). For the build errors, I'm not sure. We have a build configuration in CI very similar to what you describe and it's working fine: https://github.com/LLNL/CHAI/blob/develop/Dockerfile#L54
Okay, so there are two things here - the ENABLE_CUDA option is required (but at one point in time was the default, so wasn't needed)
Then can you please modify the main README to provide up-to-date instructions on how to build?
For the build errors, I'm not sure. We have a build configuration in CI very similar to what you describe and it's working fine
Our build of CHAI was broken when installing through spack, so I tried compiling manually and both cases produced the same errors indicated above.
FROM axom/compilers:nvcc-10 AS nvcc
Perhaps it's worth testing against the most recent major release of cuda (v10.0 is ~3 years old)
Update: when configuring with an old version of CMake (3.14), CHAI does build without error:
$ /path/to/cmake-3.14.0/bin/cmake -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DENABLE_CUDA=ON ../
...
$ make
...
[100%] Built target managed_ptr_tests
[100%] Linking CUDA device code CMakeFiles/managed_array_tests.dir/cmake_device_link.o
[100%] Linking CXX executable ../../bin/managed_array_tests
[100%] Built target managed_array_tests
[100%] Linking CUDA device code CMakeFiles/primary_pool_tests.dir/cmake_device_link.o
[100%] Linking CXX executable ../../../../../bin/primary_pool_tests
[100%] Built target primary_pool_tests
$
Perhaps the discrepancy is related to some of the recent changes to CMake's built-in support for CUDA. It would be good if CHAI could discover which versions of CMake it does support, and indicate that on the README / documentation.