oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

How to check whether the SYCL version oneDNN depends on is backward compatible?

Open wangzy0327 opened this issue 1 year ago • 7 comments

Summary

Cannot correctly compile oneDNN v3.2 version based in dpcpp(2022-06)

cmake command line

cmake .. -DCMAKE_C_COMPILER=/home/wzy/sycl_workspace/build-cuda-2022-06/bin/clang -DCMAKE_CXX_COMPILER=/home/wzy/sycl_workspace/build-cuda-2022-06/bin/clang++ -DONEDNN_CPU_RUNTIME=NONE -DONEDNN_GPU_RUNTIME=SYCL -DDNNL_BUILD_EXAMPLES=OFF -DDNNL_BUILD_TESTS=OFF -DONEDNN_BUILD_GRAPH=OFF -DDNNL_GPU_VENDOR=NVIDIA -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/home/wzy/sycl_workspace/oneDNN-cuda-v34

Version

oneDNN version in v3.2.

Environment

  • CPU make and model (try lscpu; if your lscpu does not list CPU flags, try running cat /proc/cpuinfo | grep flags | sort -u)
  • OS version ubuntu 20.04
  • Compiler version gcc 7.5.0
  • CMake version camke 3.19.5
  • CMake output log
  • make -j log
[100%] Linking CXX shared library libdnnl.so
llvm-foreach: Segmentation fault (core dumped)
clang-15: error: ptxas command failed with exit code 254 (use -v to see invocation)
clang version 15.0.0 (ssh://[email protected]:2222/wangziyang/intel-llvm-new.git 7ecb566e497fa926844521e8df2a2405c7e92e63)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/wzy/sycl_workspace/build-cuda-2022-06/bin
clang-15: note: diagnostic msg: Error generating preprocessed source(s).
src/CMakeFiles/dnnl.dir/build.make:776: recipe for target 'src/libdnnl.so.3.2' failed
make[2]: *** [src/libdnnl.so.3.2] Error 1
CMakeFiles/Makefile2:355: recipe for target 'src/CMakeFiles/dnnl.dir/all' failed
make[1]: *** [src/CMakeFiles/dnnl.dir/all] Error 2
Makefile:159: recipe for target 'all' failed
make: *** [all] Error 2
  • git hash (git log -1 --format=%H)

How to solve the problem?

wangzy0327 avatar Apr 17 '24 09:04 wangzy0327

llvm-foreach: Segmentation fault (core dumped) clang-15: error: ptxas command failed with exit code 254 (use -v to see invocation)

@wangzy0327 The issue is more likely in the compiler and the log shows that the core-dump happens in llvm-foreach.

  • You may enable then debug capabilities when building llvm and use gdb to check which compiler pass is guilty for this.

  • The issue https://github.com/intel/llvm/issues/5980 in intel/llvm repo is very similar to this one and there are already many investigations there. Could you please check if it is helpful?

shu1chen avatar Apr 17 '24 13:04 shu1chen

@wangzy0327,

Intel C++/DPC++ Compiler follows semantic versioning schema and guarantees backward compatibility within the same major version. You can find version that oneDNN was tested with in the README.md of the corresponding release. On the source code level oneDNN may also be compatible with earlier compiler releases.

vpirogov avatar Apr 17 '24 21:04 vpirogov

@vpirogov I cannot find relative content about oneDNN was tested in the README.md Which readme.md has the relevant onednn test version and the dependent sycl version? Can you provide a screenshot or link to the relevant test version?

wangzy0327 avatar Apr 18 '24 01:04 wangzy0327

I'm referring to Validated Configurations section of the README.md.

vpirogov avatar Apr 18 '24 01:04 vpirogov

@shu1chen I refered to the issue-5980 I modified the two files llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp and llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h as you listed above with the modifications. I modified it and recompiled SYCL, then compiled oneDNN.

The output of make -j as follow.

[100%] Linking CXX shared library libdnnl.so
llvm-foreach: Segmentation fault (core dumped)
clang-15: error: ptxas command failed with exit code 254 (use -v to see invocation)
clang version 15.0.0 (ssh://[email protected]:2222/wangziyang/intel-llvm-new.git 7ecb566e497fa926844521e8df2a2405c7e92e63)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/wzy/sycl_workspace/build-cuda-2022-06/bin
clang-15: note: diagnostic msg: Error generating preprocessed source(s).
src/CMakeFiles/dnnl.dir/build.make:776: recipe for target 'src/libdnnl.so.3.2' failed
make[2]: *** [src/libdnnl.so.3.2] Error 1
CMakeFiles/Makefile2:355: recipe for target 'src/CMakeFiles/dnnl.dir/all' failed
make[1]: *** [src/CMakeFiles/dnnl.dir/all] Error 2
Makefile:159: recipe for target 'all' failed
make: *** [all] Error 2

It's still the same error as before.

wangzy0327 avatar Apr 18 '24 01:04 wangzy0327

@wangzy0327 I meant that the core dump happens in the compiler and for CUDA backend, not in oneDNN. From the log, the compilation of oneDNN has completed, and the compiler triggers this error during linking phase. The issue in https://github.com/intel/llvm/issues/5980 has the similar issue for another shared library in debug mode and has some tracing info. Perhaps the solution there doesn't work for your case. I am personally not an expert in CUDA backend compiler. Could you please raise a ticket in https://github.com/intel/llvm/issues repo to see if it's more helpful?

shu1chen avatar Apr 18 '24 01:04 shu1chen

@shu1chen I have raised a ticket in intel/llvm#5980. But it no reply yet.

wangzy0327 avatar Apr 18 '24 02:04 wangzy0327