onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[Build] build of ORT 1.17.0 fails with "gcc: error: unrecognized command line option ‘-fcf-protection’

Open jcdatin opened this issue 2 years ago • 12 comments

Describe the issue

when building onnxruntime from sources , build stop when attempting to run a simple cuda test with error : NVCC_ERROR = nvcc fatal : Unknown option '-Wstrict-aliasing'

Urgency

moderate : already using ort 1.16.3

Target platform

linux (SLES15 SP4)

Build script

RUN CC=gcc-11 CXX=g++-11 ./build.sh --config RelWithDebInfo --use_cuda --cudnn_home /usr/local/cuda/lib64 --cuda_home /usr/local/cuda/ --use_tensorrt --use_tensorrt_builtin_parser --tensorrt_home /usr/local/TensorRT --build_shared_lib --parallel --skip_tests --allow_running_as_root --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=75"

Error / output

-- ******** Summary ******** -- CMake version : 3.28.0 -- CMake command : /bin/cmake -- System : Linux -- C++ compiler : /usr/bin/g++-11 -- C++ compiler version : 11.3.0 -- CXX flags : -DNDEBUG -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -O3 -pipe -ggdb3 -fstack-clash-protection -fcf-protection -ffunction-sections -fdata-sections -Wno-restrict -DCPUINFO_SUPPORTED -Wnon-virtual-dtor -- Build type : RelWithDebInfo -- Compile definitions : ORT_ENABLE_STREAM;EIGEN_MPL2_ONLY;_GNU_SOURCE;__STDC_FORMAT_MACROS -- CMAKE_PREFIX_PATH : /tmp/onnxruntime/build/Linux/RelWithDebInfo/installed -- CMAKE_INSTALL_PREFIX : /usr/local -- CMAKE_MODULE_PATH : /tmp/onnxruntime/cmake/external

-- ONNX version : 1.15.0 -- ONNX NAMESPACE : onnx -- ONNX_USE_LITE_PROTO : OFF -- USE_PROTOBUF_SHARED_LIBS : OFF -- Protobuf_USE_STATIC_LIBS : ON -- ONNX_DISABLE_EXCEPTIONS : OFF -- ONNX_DISABLE_STATIC_REGISTRATION : OFF -- ONNX_WERROR : OFF -- ONNX_BUILD_TESTS : OFF -- ONNX_BUILD_BENCHMARKS : OFF -- ONNX_BUILD_SHARED_LIBS : -- BUILD_SHARED_LIBS : OFF

-- Protobuf compiler : -- Protobuf includes : -- Protobuf libraries : -- BUILD_ONNX_PYTHON : OFF Finished fetching external dependencies -- Performing Test HAS_AMBIGUOUS_REVERSED_OPERATOR -- Performing Test HAS_AMBIGUOUS_REVERSED_OPERATOR - Failed -- Performing Test HAS_BITWISE_INSTEAD_OF_LOGICAL -- Performing Test HAS_BITWISE_INSTEAD_OF_LOGICAL - Failed -- Performing Test HAS_CAST_FUNCTION_TYPE -- Performing Test HAS_CAST_FUNCTION_TYPE - Success -- Performing Test HAS_CATCH_VALUE -- Performing Test HAS_CATCH_VALUE - Success -- Performing Test HAS_CLASS_MEMACCESS -- Performing Test HAS_CLASS_MEMACCESS - Success -- Performing Test HAS_DEPRECATED_ANON_ENUM_ENUM_CONVERSION -- Performing Test HAS_DEPRECATED_ANON_ENUM_ENUM_CONVERSION - Failed -- Performing Test HAS_DEPRECATED_BUILTINS -- Performing Test HAS_DEPRECATED_BUILTINS - Failed -- Performing Test HAS_DEPRECATED_COPY -- Performing Test HAS_DEPRECATED_COPY - Success -- Performing Test HAS_DEPRECATED_DECLARATIONS -- Performing Test HAS_DEPRECATED_DECLARATIONS - Success -- Performing Test HAS_ENUM_CONSTEXPR_CONVERSION -- Performing Test HAS_ENUM_CONSTEXPR_CONVERSION - Failed -- Performing Test HAS_FORMAT_TRUNCATION -- Performing Test HAS_FORMAT_TRUNCATION - Success -- Performing Test HAS_IGNORED_ATTRIBUTES -- Performing Test HAS_IGNORED_ATTRIBUTES - Success -- Performing Test HAS_MAYBE_UNINITIALIZED -- Performing Test HAS_MAYBE_UNINITIALIZED - Success -- Performing Test HAS_MISSING_BRACES -- Performing Test HAS_MISSING_BRACES - Success -- Performing Test HAS_NONNULL_COMPARE -- Performing Test HAS_NONNULL_COMPARE - Success -- Performing Test HAS_PARENTHESES -- Performing Test HAS_PARENTHESES - Success -- Performing Test HAS_SHORTEN_64_TO_32 -- Performing Test HAS_SHORTEN_64_TO_32 - Failed -- Performing Test HAS_STRICT_ALIASING -- Performing Test HAS_STRICT_ALIASING - Success NVCC_ERROR = nvcc fatal : Unknown option '-Wstrict-aliasing'

NVCC_OUT = 1 -- Performing Test HAS_TAUTOLOGICAL_POINTER_COMPARE -- Performing Test HAS_TAUTOLOGICAL_POINTER_COMPARE - Failed -- Performing Test HAS_UNDEFINED_VAR_TEMPLATE -- Performing Test HAS_UNDEFINED_VAR_TEMPLATE - Failed -- Performing Test HAS_UNUSED_BUT_SET_PARAMETER -- Performing Test HAS_UNUSED_BUT_SET_PARAMETER - Success -- Performing Test HAS_UNUSED_BUT_SET_VARIABLE -- Performing Test HAS_UNUSED_BUT_SET_VARIABLE - Success -- Performing Test HAS_UNUSED_VARIABLE -- Performing Test HAS_UNUSED_VARIABLE - Success -- Performing Test HAS_USELESS_CAST -- Performing Test HAS_USELESS_CAST - Success -- Looking for reallocarray -- Looking for reallocarray - found -- The CUDA compiler identification is NVIDIA 12.2.140 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - failed -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - broken CMake Error at /share/cmake-3.28/Modules/CMakeTestCUDACompiler.cmake:59 (message): The CUDA compiler

"/usr/local/cuda/bin/nvcc"

is not able to compile a simple test program.

It fails with the following output:

Change Dir: '/tmp/onnxruntime/build/Linux/RelWithDebInfo/CMakeFiles/CMakeScratch/TryCompile-iO5Fkh'

Run Build Command(s): /bin/cmake -E env VERBOSE=1 /usr/bin/gmake -f Makefile cmTC_38093/fast
/usr/bin/gmake  -f CMakeFiles/cmTC_38093.dir/build.make CMakeFiles/cmTC_38093.dir/build
gmake[1]: Entering directory '/tmp/onnxruntime/build/Linux/RelWithDebInfo/CMakeFiles/CMakeScratch/TryCompile-iO5Fkh'
Building CUDA object CMakeFiles/cmTC_38093.dir/main.cu.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler   -DNDEBUG -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -O3 -pipe -ggdb3 -fstack-clash-protection -fcf-protection  "--generate-code=arch=compute_75,code=[compute_75,sm_75]" -Xcompiler=-fPIE -MD -MT CMakeFiles/cmTC_38093.dir/main.cu.o -MF CMakeFiles/cmTC_38093.dir/main.cu.o.d -x cu -c /tmp/onnxruntime/build/Linux/RelWithDebInfo/CMakeFiles/CMakeScratch/TryCompile-iO5Fkh/main.cu -o CMakeFiles/cmTC_38093.dir/main.cu.o
gcc: error: unrecognized command line option ‘-fcf-protection’; did you mean ‘-fstack-protector’?
gmake[1]: *** [CMakeFiles/cmTC_38093.dir/build.make:79: CMakeFiles/cmTC_38093.dir/main.cu.o] Error 1
gmake[1]: Leaving directory '/tmp/onnxruntime/build/Linux/RelWithDebInfo/CMakeFiles/CMakeScratch/TryCompile-iO5Fkh'
gmake: *** [Makefile:127: cmTC_38093/fast] Error 2

Visual Studio Version

No response

GCC / Compiler Version

gcc11.3.0

jcdatin avatar Feb 22 '24 16:02 jcdatin

The fatal error is: "gcc: error: unrecognized command line option ‘-fcf-protection’; did you mean ‘-fstack-protector’?" Can you try the latest main branch? We no longer add the compile flag by default.

snnn avatar Feb 22 '24 16:02 snnn

GCC 11 should support the "-fcf-protection" flag, unless your CPU arch is not x86_64.

snnn avatar Feb 22 '24 16:02 snnn

my CPU arch is x64_64 . trying main branch. keep in touch

jcdatin avatar Feb 22 '24 17:02 jcdatin

much better

jcdatin avatar Feb 22 '24 17:02 jcdatin

I have a question : in previous transition from ORT 1.5 to 1.16 I had to add flag --cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=75" to fix a build issue . Is this flag still mandatory ?

jcdatin avatar Feb 22 '24 17:02 jcdatin

I highly recommend explicitly listing your GPU's compute arches there in :

--cmake_extra_defines "CMAKE_CUDA_ARCHITECTURES=75"

Because the default values might not be always the best.

snnn avatar Feb 22 '24 20:02 snnn

ok, now with main latest 1.17.0 builds. So can fix the issue and wait for the new 1.17.x

about --cmake_extra_defines, what is the effect ? Does it means onnxrt + tensorRT EP will work on any Nvdia architectures above sm_75 (Turing) ? I would like it to work on the 3 latest architectures Turing, Ampere and ADA. If so , then, does it also means I will see inference acceleration improvement on ADA If I set the flag to the ADA architecture (sm_89) - with the drawback that Turing and Ampere not supported any more for my apps ?

jcdatin avatar Feb 23 '24 08:02 jcdatin

It means just sm_75. It's not a range. You can list all your GPU architectures there as a list like:

"-DCMAKE_CUDA_ARCHITECTURES=75;80;90" 

snnn avatar Feb 23 '24 19:02 snnn

I think probably you used the wrong compiler. You might have more than one GCC there. You used gcc-11 for compiling ONNX Runtime's C/C++ code, but you used a different one for compiling the CUDA code. Therefore it is not a bug and we do not plan to "fix" it in a patch release.

snnn avatar Feb 23 '24 19:02 snnn

I do not control how the cuda code is compiled , got the Cuda 12.2 packages from Nvidia repo. But are you talking about the 1.17.0 releasE with my issue "gcc: error: unrecognized command line option ‘-fcf-protection’; did you mean ‘-fstack-protector’?" ? or about the fact I need to add flag -DCMAKE_CUDA_ARCHITECTURES ? otherwise true I have both gcc7 besides gcc11 in the docker image used to compile onnxruntime (part of sles15 base image). But using gcc11 to compile onnxruntime (with tensorRT EP)

jcdatin avatar Feb 24 '24 09:02 jcdatin

You were using gcc11 to compile ONNX Runtime's C/C++ code, but gcc 7 to compile ONNX Runtime's CUDA code(*.cu files). And we dropped the support for GCC 7. GCC 11 supports ‘-fcf-protection’, but GCC 7 does not. Therefore you hit the error.

snnn avatar Feb 24 '24 21:02 snnn

Thx . How can I avoid to use gcc7 to compile ONNX Runtimes CUDA code ? I did nothing for that : just calling RUN CC=gcc-11 CXX=g++-11 ./build.sh

jcdatin avatar Feb 25 '24 09:02 jcdatin

You need to add --cmake_extra_defines CMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-11

snnn avatar Feb 26 '24 19:02 snnn

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

github-actions[bot] avatar Mar 28 '24 15:03 github-actions[bot]

I tried --cmake_extra_defines CMAKE_CUDA_HOST_COMPILER and it fixed the problem. Closing the case

jcdatin avatar Mar 28 '24 17:03 jcdatin