Clang transpiler integration
Description
This pull request is aimed for integration occa-transpiler library for providing full C++ support under the OCCA
Added:
option for switching between old & new transpiler transpiler-version
cmake -DOCCA_CLANG_BASED_TRANSPILER=ON worked for me to get the new transpiler source and generate the build.
Hi - Any hope that this gets merged?
Please take a look at this issue.
@kris-rowe the issue is addressed, please take a look and try the fix.
@kris-rowe all issues were addressed, can we please have a conclusion on this?
Hi @kris-rowe If you have any additional comments, questions or concerns I am glad to resolve to merge the PR.
Hi @IuriiKobein, I am planning to test this branch soon. I will let you know if I run into any issues.
I started testing this branch on Frontier at OLCF and I am running into a segmentation
fault when I run 31_oklt_v3_moving_avg test.
I had to make the following changes in occa-transpiler since the CMake 3.26
was not available on Frontier (hope this is not the reason for the segfault).
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 2d9cc30..659d44f 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 3.26)
+cmake_minimum_required(VERSION 3.23)
project(occa-transpiler VERSION 0.0.1 LANGUAGES C CXX)
diff --git a/lib/CMakeLists.txt b/lib/CMakeLists.txt
index 182f1e0..dd8b545 100644
--- a/lib/CMakeLists.txt
+++ b/lib/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 3.26)
+cmake_minimum_required(VERSION 3.23)
project (occa-transpiler VERSION 0.0.1 LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
diff --git a/tool/CMakeLists.txt b/tool/CMakeLists.txt
index 543d898..98cdb5c 100644
--- a/tool/CMakeLists.txt
+++ b/tool/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 3.26)
+cmake_minimum_required(VERSION 3.23)
project (occa-tool VERSION 0.0.1 LANGUAGES CXX)
Then I followed the build instructions and everything built fine. When I tried to run the test, I get the following:
[[email protected] 31_oklt_v3_moving_avg]$ export OKLT_LOG_LEVEL=trace
[[email protected] 31_oklt_v3_moving_avg]$ ./examples_cpp_oklt_v3_moving_avg
[11:50:40.179] [I] start: OKL_DIRECTIVE_EXPANSION_STAGE [stage_action_runner.cpp:32]
[11:50:40.179] [T] input source:
#include "constants.h"
template<class T,
int THREADS,
int WINDOW>
struct MovingAverage {
MovingAverage(int inputSize,
int outputSize,
T *shared_input,
T *shared_output)
:_inputSize(inputSize)
,_outputSize(outputSize)
,_shared_data(shared_input)
,_result_data(shared_output)
{}
void syncCopyFrom(const T *input, int block_idx, int thread_idx) {
int linearIdx = block_idx * THREADS + thread_idx;
//INFO: copy base chunk
if(linearIdx < _inputSize) {
_shared_data[thread_idx] = input[linearIdx];
}
//INFO: copy WINDOW chunk
int tailIdx = (block_idx + 1) * THREADS + thread_idx;
if(tailIdx < _inputSize && thread_idx < WINDOW) {
_shared_data[THREADS + thread_idx] = input[tailIdx];
}
@barrier;
}
void process(int thread_idx) {
T sum = T();
for(int i = 0; i < WINDOW; ++i) {
sum += _shared_data[thread_idx + i];
}
_result_data[thread_idx] = sum / WINDOW;
@barrier;
}
void syncCopyTo(T *output, int block_idx, int thread_idx) {
int linearIdx = block_idx * THREADS + thread_idx;
if(linearIdx < _outputSize) {
output[linearIdx] = _result_data[thread_idx];
}
@barrier;
}
private:
int _inputSize;
int _outputSize;
//INFO: not supported
// @shared T _data[THREADS_PER_BLOCK + WINDOW_SIZE];
// @shared T _result[THREADS_PER_BLOCK];
T *_shared_data;
T *_result_data;
};
@kernel void movingAverage32f(@restrict const float *inputData,
int inputSize,
@restrict float *outputData,
int outputSize)
{
@outer(0) for (int block_idx = 0; block_idx < outputSize / THREADS_PER_BLOCK + 1; ++block_idx) {
@shared float blockInput[THREADS_PER_BLOCK + WINDOW_SIZE];
@shared float blockResult[THREADS_PER_BLOCK];
MovingAverage<float, THREADS_PER_BLOCK, WINDOW_SIZE> ma{
inputSize,
outputSize,
blockInput,
blockResult
};
@inner(0) for(int thread_idx = 0; thread_idx < THREADS_PER_BLOCK; ++thread_idx) {
ma.syncCopyFrom(inputData, block_idx, thread_idx);
ma.process(thread_idx);
ma.syncCopyTo(outputData, block_idx, thread_idx);
}
}
}
[stage_action_runner.cpp:33]
Segmentation fault
This is the backtrace I get with gdb:
#0 0x00007fffe78bf121 in llvm::vfs::InMemoryFileSystem::addFile(llvm::Twine const&, long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, std::optional<unsigned int>, std::optional<unsigned int>, std::optional<llvm::sys::fs::file_type>, std::optional<llvm::sys::fs::perms>) () from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#1 0x00007fffe77df5aa in oklt::addInstrinsicStub (session=..., compiler=...) at /ccs/home/thilina/fus166/.local/occa-transpiler/clang/include/llvm/ADT/Twine.h:285
#2 0x00007fffe782dd43 in oklt::StageAction::PrepareToExecuteAction (this=0x4c1a00, compiler=...) at /usr/include/c++/12/bits/shared_ptr_base.h:1665
#3 0x00007fffe973b398 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#4 0x00007fffe7949f0e in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) ()
from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#5 0x00007fffe79415ac in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) ()
from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#6 0x00007fffe79452d8 in clang::tooling::ToolInvocation::run() () from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#7 0x00007fffe7949456 in clang::tooling::runToolOnCodeWithArgs(std::unique_ptr<clang::FrontendAction, std::default_delete<clang::FrontendAction> >, llvm::Twine const&, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, llvm::Twine const&, llvm::Twine const&, std::shared_ptr<clang::PCHContainerOperations>) ()
from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#8 0x00007fffe794993d in clang::tooling::runToolOnCodeWithArgs(std::unique_ptr<clang::FrontendAction, std::default_delete<clang::FrontendAction> >, llvm::Twine const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, llvm::Twine const&, llvm::Twine const&, std::shared_ptr<clang::PCHContainerOperations>, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) () from /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/build/lib/libocca-transpiler.so.17
#9 0x00007fffe783266d in oklt::runStageAction (stageName=..., session=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/deps/occa-transpiler/lib/pipeline/core/stage_action_runner.cpp:68
#10 0x00007fffe7833144 in oklt::runPipeline (pipeline=..., session=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/deps/occa-transpiler/lib/pipeline/core/stage_action_runner.cpp:97
#11 0x00007fffe7829a21 in oklt::normalizeAndTranspile (input=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/deps/occa-transpiler/lib/pipeline/normalizer_and_transpiler.cpp:16
#12 0x00007fffed8eaad4 in occa::transpiler::Transpiler::run (this=this@entry=0x7fffffff5140, filename=..., mode=..., kernelProps=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/occa/internal/utils/transpiler_utils.cpp:135
#13 0x00007fffed8b3ee4 in occa::serial::v3::transpileFile (filename=..., outputFile=..., kernelProps=..., metadata=..., mode=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/occa/internal/modes/serial/device.cpp:69
#14 0x00007fffed8b7147 in occa::serial::device::buildKernel (this=this@entry=0x478460, filename=..., kernelName=..., kernelHash=..., kernelProps=..., isLauncherKernel=<optimized out>, isLauncherKernel@entry=false)
at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/occa/internal/modes/serial/device.cpp:353
#15 0x00007fffed8b778d in occa::serial::device::buildKernel (this=this@entry=0x478460, filename=..., kernelName=..., kernelHash=..., kernelProps=...)
at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/occa/internal/modes/serial/device.cpp:168
#16 0x00007fffed7010c6 in occa::device::buildKernel (this=this@entry=0x7fffffff5a40, filename=..., kernelName=..., props=...) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/src/core/device.cpp:394
#17 0x0000000000401ea2 in main (argc=<optimized out>, argv=<optimized out>) at /ccs/home/thilina/fus166/Workspace/anl/occa-transpiler/occa/examples/cpp/31_oklt_v3_moving_avg/main.cpp:67
I am using gcc=12.3.0 to build OCCA and doing a release build. I will try a debug build and
see if it gives me more information.
PS: I was actually doing a release build with debug info.
Thanks for report. At this moment I have a quick question that help us to proceed with a potential fix.
Did clang was installed according to the https://github.com/libocca/occa-transpiler?tab=readme-ov-file#setup-clang-17 section? If yes which exactly variant was used?
Thanks for report. At this moment I have a quick question that help us to proceed with a potential fix.
Did clang was installed according to the https://github.com/libocca/occa-transpiler?tab=readme-ov-file#setup-clang-17 section? If yes which exactly variant was used?
I installed clang from the source checking out the llvmorg-17.0.6 tag.
Below is the commit:
commit 6009708b4367171ccdbf4b5905cb6a803753fe18 (grafted, HEAD, tag: llvmorg-17.0.6)
Author: Tobias Hieta <[email protected]>
Date: Tue Nov 28 09:52:28 2023 +0100
Revert "[runtimes] Add missing test dependencies to check-all (#72955)"
This reverts commit e957e6dcb29d94e4e1678da9829b77009be88926.
The commit was reverted on main because of issues. We will not carry
this in the release branch for 17.x
These are the configure and build commands I used:
cmake -S llvm -B build -G "Unix Makefiles" \
-DCMAKE_C_COMPILER=`which gcc` \
-DCMAKE_CXX_COMPILER=`which g++` \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=~/fus166/.local/occa-transpiler/clang \
-DLLVM_ENABLE_WERROR=OFF \
-DLLVM_TARGETS_TO_BUILD='X86' \
-DLLVM_PARALLEL_LINK_JOBS=1 \
-DLLVM_ENABLE_RTTI=ON \
-DCMAKE_POLICY_DEFAULT_CMP0094=NEW \
-DCMAKE_VERBOSE_MAKEFILE=ON \
-DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF \
-DLLVM_ENABLE_PROJECTS="polly;lld;lldb;clang-tools-extra;llvm;clang" \
-DLLVM_ENABLE_RUNTIMES="libunwind;libcxx;libcxxabi;compiler-rt" \
-DLLVM_REQUIRES_RTTI=ON \
-DLLVM_ENABLE_RTTI=ON \
-DLLVM_ENABLE_EH=ON \
-DLLVM_POLLY_LINK_INTO_TOOLS=ON \
-DLLVM_Z3_INSTALL_DIR=${Z3_INSTALL_DIR} \
-DLLVM_ENABLE_Z3_SOLVER=OFF
make -C build install -j12
I think the only thing different to the configure command in the instructions
is that I turned-off Z3-solver.
So far we couldn't reproduce the issue on our local machines with already setup configuration. The next use the same CMake version and clang build options as yours to catch the issue.
Seems like the reason for the segfault was that I used two different versions
of gcc: one version to build clang and another version to build occa.
Once I used the same gcc version for both, I don't see a segfault anymore.
Now I can run the test but it still fails:
[[email protected] 31_oklt_v3_moving_avg]$ ./examples_cpp_oklt_v3_moving_avg
Comparison with gold values has failed
I can attach the full log with trace on if that is helpful.
Glad that the root cause of segfault is found. The test example was tested only for CUDA/HIP backends. You could verify it by following options:
examples_cpp_oklt_v3_moving_avg -d "{mode: 'CUDA', device_id: 0}"
We are working to fix it for Serial mode as well that is the default one if -d option is omitted.
Thanks ! Yes, the example passes with HIP backend. I will try to test this on a few more kernels.
Hi Thilina,
The example "31_oklt_v3_moving_avg" is fixed to support host only backends: Serial, OpenMP. Please pull the latest change and try to fix. Looking forward for your feedback.
With your latest fix, the tests pass for HIP, Serial and OpenMP backends. I will test this a bit more.
@IuriiKobein : I added a simple kernel which calculates the dot product between two
vectors here. Seems like it fails with the transpiler. The failure is due to transpiler not
recognizing unsigned int. I think OCCA supports unsigned int (I may be wrong).
@thilinarmtb please refer the issue reported above for clarification.
Is transpiler version 2 is the same as regular OCCA? Seems like unsigned
@thilinarmtb please refer the issue reported above for clarification.
We will continue the discussion there till the issue is resolved.
I am fine with merging this. I can add a few more tests after the PR gets merged. Below are a few minor comments I have.
Probably we should figure out the minimum versions of CMake and gcc required and add those
versions to CMakeListsts.txt and documentation. For example, I don't think we need a CMake
version as new as the one currently used in this PR.
Also, it is worth mentioning that you have to build clang and the occa-transpiler using the same compiler. I don't know if this is an actual requirement. But I had to do so in order to run it on my testing machine.
Hi @thilinarmtb I have lowered cmake version to be same as OCCA has. Also added notes regarding minimum GCC version and to use the same version of compiler to build clang and transpiler itself.
@IuriiKobein : Thank you very much for the changes. I will merge the PR once the tests pass.
Hi @thilinarmtb @kris-rowe Do you have any updates regarding this PR? Thanks
I realized that the occa-transpiler is not tested in GitHub CI. I am trying to
test it here. I am running into a bunch of errors (which I think is due to some
header file conflict). I don't run into this issue locally.
@thilinarmtb could you please test default compiler flags from OCCA cmake on CI?
-- C flags : -Wall -Wextra -Wunused-function -Wunused-variable -Wwrite-strings -Wfloat-equal -Wcast-align -Wlogical-op -Wshadow -Wno-c++11-long-long -O3 -DNDEBUG -- CXX flags : -Wall -Wextra -Wunused-function -Wunused-variable -Wwrite-strings -Wfloat-equal -Wcast-align -Wlogical-op -Wshadow -Wno-unused-parameter -fno-strict-aliasing -O3 -DNDEBUG
@IuriiKobein : It worked. But it takes about 18 minutes to build OCCA with transpiler
enabled. One option is to package transpiler as a conda package, install it on GitHub
CI runners (this will be fast) and then build OCCA by linking with the transpiler library.
I can package the transpiler to a conda package if you are interested in taking this route.
I appreciate it if you could handle this approach. We plan to speed up a compilation time of transpiler after initial merge to OCCA.
BTW on average machine with 16 logical i7 cores it takes about 3 minutes so it is a little bit of suprise why on CI it is in times slower.
My build using 16 parallel processes failed in GitHub CI. I tried both 8 and 4 processes and it worked but took about 18 minutes. See here.
I will open a few minor issues on occa-transpiler repo in order to fix a few things
before going ahead with a conda package.
Issues are fixed and PR is updated.
Thanks @IuriiKobein. I will go ahead with the conda package.