GPU GEOS : Mismatched template definition usage in coreComponents/physicsSolvers
Describe the bug
Building GEOS (latest "develop" and TPL "master") with CUDA, runs into conflicting definitions for certain templates.
To Reproduce Steps to reproduce the behavior:
- Build TPL and then GEOS with GCC 13.2.0 and CUDA 11.8 or 12.5, or 12.6.
- Click on '....'
- Scroll down to '....'
- See error Try to provide minimal test cases where possible to help isolate the problem.
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Platform (please complete the following information):
- Machine AMD node CPU : AMD EPYC 7V12 and GPU: A100 A100-SXM4-80GB
- Compiler: Host: gcc 13.2.0, GPU : CUDA 11.8 or 12.5, or 12.6
- GEOS Version : v1.1.0 commit 93f0252a19f240dc3b7375cd93bbe036ac318063
Additional context
Output from make all
[ 86%] Building CUDA object coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp.o
cd /dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.20-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DDIY_NO_THREADS -DFMT_HEADER_ONLY=1 -DH5_BUILT_AS_DYNAMIC_LIB -DMPICH_SKIP_MPICXX -DMPI_NO_CPPBIND -DOMPI_SKIP_MPICXX -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_MPICC_H -D_POSIX_C_SOURCE=200809L -Dkiss_fft_scalar=double -DphysicsSolvers_EXPORTS --options-file CMakeFiles/physicsSolvers.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -g -lineinfo -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -ftz=true -O3 -DNDEBUG -Xcompiler -DNDEBUG -Xcompiler -Ofast -ftz=true -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIC -Xcompiler=-fopenmp -Xcompiler=-pthread -Xcompiler -pthread -MD -MT coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp.o -MF CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp.o.d -x cu -rdc=true -c /dev/shm/mtml/src/GEOS/GEOS/src/coreComponents/physicsSolvers/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp -o CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustic/secondOrderEqn/anisotropic/AcousticVTIWaveEquationSEM.cpp.o
[ 86%] Building CUDA object coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp.o
cd /dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.20-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DDIY_NO_THREADS -DFMT_HEADER_ONLY=1 -DH5_BUILT_AS_DYNAMIC_LIB -DMPICH_SKIP_MPICXX -DMPI_NO_CPPBIND -DOMPI_SKIP_MPICXX -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_MPICC_H -D_POSIX_C_SOURCE=200809L -Dkiss_fft_scalar=double -DphysicsSolvers_EXPORTS --options-file CMakeFiles/physicsSolvers.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -g -lineinfo -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -ftz=true -O3 -DNDEBUG -Xcompiler -DNDEBUG -Xcompiler -Ofast -ftz=true -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIC -Xcompiler=-fopenmp -Xcompiler=-pthread -Xcompiler -pthread -MD -MT coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp.o -MF CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp.o.d -x cu -rdc=true -c /dev/shm/mtml/src/GEOS/GEOS/src/coreComponents/physicsSolvers/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp -o CMakeFiles/physicsSolvers.dir/wavePropagation/sem/acoustoelastic/secondOrderEqn/isotropic/AcousticElasticWaveEquationSEM.cpp.o
[ 86%] Linking CUDA device code CMakeFiles/physicsSolvers.dir/cmake_device_link.o
cd /dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /data/saet/mtml/software/x86_64/cmake-3.28.3-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/physicsSolvers.dir/dlink.txt --verbose=1
/vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.20-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -g -lineinfo -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -ftz=true -O3 -DNDEBUG -Xcompiler -DNDEBUG -Xcompiler -Ofast -ftz=true "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fopenmp -Xcompiler=-L/vend/nvidia/cuda/v12.6/lib64 -Xlinker=-rpath -Xlinker=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.20-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/lib -Xlinker=--enable-new-dtags -Xcompiler=-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink --options-file CMakeFiles/physicsSolvers.dir/deviceObjects1.rsp --options-file CMakeFiles/physicsSolvers.dir/deviceObjects2.rsp -o CMakeFiles/physicsSolvers.dir/cmake_device_link.o --options-file CMakeFiles/physicsSolvers.dir/deviceLinkLibs.rsp
nvlink error : Size doesn't match for '_ZN4geos13finiteElement17FiniteElementBaseC1ERKS1_$824' in 'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DelftEgg-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o', first specified in 'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DruckerPragerExtended-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o' (target: sm_80)
nvlink fatal : merge_elf failed (target: sm_80)
make[2]: *** [coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build.make:13103: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/cmake_device_link.o] Error 1
make[2]: Target 'coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build' not remade because of errors.
make[2]: Leaving directory '/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make[1]: *** [CMakeFiles/Makefile2:8964: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/all] Error 2
make[1]: Target 'all' not remade because of errors.
make[1]: Leaving directory '/dev/shm/mtml/src/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make: *** [Makefile:146: all] Error 2
[Make-build-2024-12-17-103544.log](https://github.com/user-attachments/files/18171272/Make-build-2024-12-17-103544.log)
[PoromechanicsKernels_CellElementSubRegion_PorousSolid-DruckerPragerExtended-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.txt](https://github.com/user-attachments/files/18171302/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DruckerPragerExtended-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.txt)
[ThermoPoromechanicsKernels_CellElementSubRegion_PorousSolid-DelftEgg-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.txt](https://github.com/user-attachments/files/18171303/ThermoPoromechanicsKernels_CellElementSubRegion_PorousSolid-DelftEgg-_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.txt)
When we force set CALC_FEM_SHAPE_IN_KERNEL we get these messages during compilation :
[ 42%] Building CUDA object coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o
cd /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo/coreComponents/finiteElement/unitTests && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DFMT_HEADER_ONLY=1 -DGTEST_HAS_DEATH_TEST=1 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200809L -DtestFiniteElementBase_EXPORTS --options-file CMakeFiles/testFiniteElementBase.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -DGEOS_USE_DEVICE -g -lineinfo -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIE -Xcompiler=-fopenmp -Xcompiler=-pthread -MD -MT coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o -MF CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o.d -x cu -rdc=true -c /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp -o CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o
In file included from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/common/DataTypes.hpp:27,
from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/elementFormulations/FiniteElementBase.hpp:29,
from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp:17:
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo/include/common/GeosxConfig.hpp:199: warning: "GEOS_USE_DEVICE" redefined
199 | #define GEOS_USE_DEVICE
|
<command-line>: note: this is the location of the previous definition
In file included from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/common/DataTypes.hpp:27,
from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/elementFormulations/FiniteElementBase.hpp:29,
from /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp:17:
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo/include/common/GeosxConfig.hpp:199: warning: "GEOS_USE_DEVICE" redefined
199 | #define GEOS_USE_DEVICE
|
<command-line>: note: this is the location of the previous definition
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: "parallelDevicePolicy" is ambiguous
forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
^
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: expected an expression
forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
^
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: expected an identifier
forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
^
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: expected a type specifier
forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
^
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(116): error: expected a type specifier
forAll< parallelDevicePolicy<> >( 1, [ feBase, gradNDimsView, detJDimsView ]( int const i )
^
/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp(134): warning #12-D: parsing restarts here after previous syntax error
} );
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
5 errors detected in the compilation of "/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src/coreComponents/finiteElement/unitTests/testFiniteElementBase.cpp".
make[2]: *** [coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/build.make:77: coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/testFiniteElementBase.cpp.o] Error 2
make[2]: Target 'coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/build' not remade because of errors.
make[2]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo'
make[1]: *** [CMakeFiles/Makefile2:7925: coreComponents/finiteElement/unitTests/CMakeFiles/testFiniteElementBase.dir/all] Error 2
I used the branch 'testing/cusini/deactivate-some-kernels' Matteo prepared (https://github.com/GEOS-DEV/GEOS/pull/3516)
Unfortunately, the base class FiniteElementBase mismatch error now moved to another pair of derived classes:
/vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -g -lineinfo -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fopenmp -Xcompiler=-L/vend/nvidia/cuda/v12.6/lib64 -Xlinker=-rpath -Xlinker=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/lib -Xlinker=--enable-new-dtags -Xcompiler=-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink --options-file CMakeFiles/physicsSolvers.dir/deviceObjects1.rsp --options-file CMakeFiles/physicsSolvers.dir/deviceObjects2.rsp -o CMakeFiles/physicsSolvers.dir/cmake_device_link.o --options-file CMakeFiles/physicsSolvers.dir/deviceLinkLibs.rsp
nvlink error : Size doesn't match for '_ZN4geos13finiteElement17FiniteElementBaseC1ERKS1_$933' in
'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/ThermoPoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o', first specified in
'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o' (target: sm_80)
nvlink fatal : merge_elf failed (target: sm_80)
make[2]: *** [coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build.make:12168: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/cmake_device_link.o] Error 1
make[2]: Target 'coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build' not remade because of errors.
make[2]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make[1]: *** [CMakeFiles/Makefile2:8964: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/all] Error 2
make[1]: Target 'all' not remade because of errors.
make[1]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make: *** [Makefile:146: all] Error 2
Here are excerpts from the failing build process with testing/cusini/deactivate-some-kernels
[ 61%] Building CUDA object coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o
cd /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DDIY_NO_THREADS -DFMT_HEADER_ONLY=1 -DH5_BUILT_AS_DYNAMIC_LIB -DMPICH_SKIP_MPICXX -DMPI_NO_CPPBIND -DOMPI_SKIP_MPICXX -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_MPICC_H -D_POSIX_C_SOURCE=200809L -Dkiss_fft_scalar=double -DphysicsSolvers_EXPORTS --options-file CMakeFiles/physicsSolvers.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -g -lineinfo -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIC -Xcompiler=-fopenmp -Xcompiler=-pthread -Xcompiler -pthread -MD -MT coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o -MF CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o.d -x cu -rdc=true -c /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp -o CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o
[ 63%] Building CUDA object coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o
cd /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -DCAMP_HAVE_CUDA -DDIY_NO_THREADS -DFMT_HEADER_ONLY=1 -DH5_BUILT_AS_DYNAMIC_LIB -DMPICH_SKIP_MPICXX -DMPI_NO_CPPBIND -DOMPI_SKIP_MPICXX -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE -D_MPICC_H -D_POSIX_C_SOURCE=200809L -Dkiss_fft_scalar=double -DphysicsSolvers_EXPORTS --options-file CMakeFiles/physicsSolvers.dir/includes_CUDA.rsp -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -g -lineinfo -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fPIC -Xcompiler=-fopenmp -Xcompiler=-pthread -Xcompiler -pthread -MD -MT coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o -MF CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o.d -x cu -rdc=true -c /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp -o CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o
[ 93%] Linking CUDA device code CMakeFiles/physicsSolvers.dir/cmake_device_link.o
cd /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo/coreComponents/physicsSolvers && /data/saet/mtml/software/x86_64/cmake-3.28.3-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/physicsSolvers.dir/dlink.txt --verbose=1
/vend/nvidia/cuda/v12.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/bin/mpic++ -restrict -arch sm_80 --expt-extended-lambda --expt-relaxed-constexpr -Werror cross-execution-space-call,reorder,deprecated-declarations -g -lineinfo -O3 -ftz=true -Xcompiler -O3 -Xcompiler -fno-fast-math -Xcompiler -mdaz-ftz -DNDEBUG "--generate-code=arch=compute_80,code=[compute_80,sm_80]" -Xcompiler=-fopenmp -Xcompiler=-L/vend/nvidia/cuda/v12.6/lib64 -Xlinker=-rpath -Xlinker=/chv/az_ussc_p/x86_64-rhel3/util/hpcx/hpcx-v2.21-gcc-doca_ofed-redhat8-cuda12-x86_64/ompi/lib -Xlinker=--enable-new-dtags -Xcompiler=-pthread -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink --options-file CMakeFiles/physicsSolvers.dir/deviceObjects1.rsp --options-file CMakeFiles/physicsSolvers.dir/deviceObjects2.rsp -o CMakeFiles/physicsSolvers.dir/cmake_device_link.o --options-file CMakeFiles/physicsSolvers.dir/deviceLinkLibs.rsp
nvlink error : Size doesn't match for '_ZN4geos13finiteElement17FiniteElementBaseC1ERKS1_$933' in
'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/ThermoPoromechanicsKernels_CellElementSubRegion_PorousSolid-Damage-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o', first specified in
'CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/physicsSolvers/multiphysics/poromechanicsKernels/PoromechanicsKernels_CellElementSubRegion_PorousSolid-DamageSpectral-ElasticIsotropic--_H1_Hexahedron_Lagrange1_GaussLegendre2.cpp.o' (target: sm_80)
nvlink fatal : merge_elf failed (target: sm_80)
make[2]: *** [coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build.make:12168: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/cmake_device_link.o] Error 1
make[2]: Target 'coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/build' not remade because of errors.
make[2]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make[1]: *** [CMakeFiles/Makefile2:8964: coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/all] Error 2
make[1]: Target 'all' not remade because of errors.
make[1]: Leaving directory '/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo'
make: *** [Makefile:146: all] Error 2
Are the problems caused by some missed '__device__' definition between
GEOS\src\coreComponents\physicsSolvers\multiphysics\poromechanicsKernels\MultiphasePoromechanics_impl.hpp
and
GEOS\src\coreComponents\physicsSolvers\multiphysics\poromechanicsKernels\SinglePhasePoromechanics_impl.hpp
?
@drmichaeltcvx It is difficult for me to think that this would have any dependence on hardware (Intel vs AMD). I would think this has to be something in the software stack. The dockerfiles that generate our TPL environment for CI are here: https://github.com/GEOS-DEV/thirdPartyLibs/tree/master/docker
You should look on dockerhub for a base image with closest match to your Linux distribution/version. I don't know if you have provided the linux distribution you are on. Once you have a suitable base image with a hopefully equivalent software stack, adding the image should involve copying one of the dockerfiles, replacing the base image, and modifying the packages to mimic your software stack.
here are some examples of base images: https://hub.docker.com//ubuntu https://hub.docker.com/r/rockylinux/rockylinux https://hub.docker.com//fedora
Thanks Randy,
How do you generate the CI image for GEOS? Do you have dockerfiles for this?
Michael
From: Randolph Settgast @.> Sent: Wednesday, January 22, 2025 12:15 PM To: GEOS-DEV/GEOS @.> Cc: Thomadakis, Michael @.>; Mention @.> Subject: [EXTERNAL] Re: [GEOS-DEV/GEOS] GPU GEOS : Mismatched template definition usage in coreComponents/physicsSolvers (Issue #3496)
Be aware this external email contains an attachment and/or link. Ensure the email and contents are expected. If there are concerns, please submit suspicious messages to the Cyber Intelligence Center using the Report Phishing button.
@drmichaeltcvxhttps://github.com/drmichaeltcvx It is difficult for me to think that this would have any dependence on hardware (Intel vs AMD). I would think this has to be something in the software stack. The dockerfiles that generate our TPL environment for CI are here: https://github.com/GEOS-DEV/thirdPartyLibs/tree/master/docker
You should look on dockerhub for a base image with closest match to your Linux distribution/version. I don't know if you have provided the linux distribution you are on. Once you have a suitable base image with a hopefully equivalent software stack, adding the image should involve copying one of the dockerfiles, replacing the base image, and modifying the packages to mimic your software stack.
here are some examples of base images: https://hub.docker.com//ubuntu https://hub.docker.com/r/rockylinux/rockylinux https://hub.docker.com//fedora
Reply to this email directly, view it on GitHubhttps://github.com/GEOS-DEV/GEOS/issues/3496#issuecomment-2607939904, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AS6ZG2Q3WGOY53IRTB75GN32L7N3FAVCNFSM6AAAAABTZIN3K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBXHEZTSOJQGQ. You are receiving this because you were mentioned.Message ID: @.@.>>
I am providing here the configure and build logs for a failed GPU build. Let's go through these first to see if we can identify any useful information that could point to where the problem starts. TPL builds fine for GPUs on our s/w stack.
CMake command
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo -DENABLE_YAPF=OFF -DGEOSX_DIR=/data/saet/mtml/src/GEOS_miket/GEOS/GEOS/build-GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP-relwithdebinfo -DGEOSX_TPL_DIR=/data/saet/mtml/software/x86_64/RHEL8/GEOSTPL/1.1.0-miket__GPU-build-fix-2025-01-14/install-GPU-OPTO3-Hypre-GCC-CUDA-MPI-OMP-relwithdebinfo -C/data/saet/mtml/src/GEOS_mtml/GEOS/host-configs/CVX/GPU-Hypre-GCC-CUDA_12.6-ompi_hpcx-OMP.cmake /data/saet/mtml/src/GEOS_miket/GEOS/GEOS/src
- Configure log
- build log
Make-build-2025-01-17-134202.log
Please comment
you should try using static linking and see if the error persists.
You can add
set(GEOS_BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE)
to your host-config.
Here are the configure and build logs for the all static linking approach.
Configure Config-2025-02-01-093152.log
Here are the configure and build logs for the all static linking approach.
Configure Config-2025-02-01-093152.log
nothing seems to change. Okay, well, at this point 2 things need to happen:
- Try to reproduce it on a CI image because currently none of our images shows this error and we can't reproduce it on Lassen. It must be linked to the software stack used on the CVX systems.
- Figure out the merge of which PR triggered this behavior.
Thanks Randy,
How do you generate the CI image for GEOS? Do you have dockerfiles for this?
Michael
Docker files are defined here and ci jobs are defined here for the tpls and here for geos. If you start by setting up the base image in the tpls repo I can help you with adding a CI job in the geos repo. The hard task will be finding an image to reproduce the error. It would be great if you had an image of one of the systems you are using but I suspect that this does not exist.
Thanks, Mateo...
Yeap, no docker images for the standard HPC image I am using. We do not generate docker images. I will consult with our admins to try to get TPL/GEOS docker images for our s/w stack.
Actually the only thing that may be of interest to evaluate is our Linux OS kernel (we are using Alma8, an offshoot of RHEL8 running on Microsoft's virtualization engine on Azure). All other S/W stack components are standard: GCC (10, 11, 13) and CUDA 11.x, 12.x with latest CUDA drivers.
BTW, I also checked the dependency files (coreComponents/physicsSolvers/CMakeFiles/physicsSolvers.dir/__/__/generatedSrc/coreComponents/*/*/*/*.o.d) to ensure that the FiniteElementBase.hpp is included in all of the generated class definitions and apparently none of them is missing it.
I was hoping that some of them would miss including it, but nope...
https://github.com/GEOS-DEV/GEOS/pull/3725