Ye Luo
Ye Luo
Still need to address #2433 Make the following two cases work with https://github.com/ye-luo/cmake_gpu/tree/master/test_rocm Case 1. ``` cmake -DCMAKE_HIP_COMPILER=/opt/rocm/bin/amdclang++ .. ``` Case 2 AOMP or upstream clang doesn't use ROCm layout....
ROCM 4.5 and hip-lang package assume hip is installed as `/hip` and tries to access paths outside `/hip` and inside ``` for example https://github.com/ROCm-Developer-Tools/HIP/blob/cddb52549b4f4fce9165cac8b2ccf25173ba3157/hip-lang-config.cmake.in#L108 reproducer ``` cmake_minimum_required(VERSION 3.21.0) project(test_hip CXX)...
See #160 figure. Currently every asynchronous transfer and kernel_dispatch within a target offload has single_wait_scaquire (synchronization) right after. It should be nice to optimize them by enqueuing H2D, kernel_dispatch, D2H...
This figure comes from a single offload region. https://github.com/ye-luo/miniqmc/blob/46073436a432fc0472bf793784f9f87b5f8fdfcb/src/QMCWaveFunctions/einspline_spo_omp.cpp#L407 The map to/from arrays are already mapped with "target enter data" Why there are still memory_pool_allocate and agents_allow access and memory_pool_free...
HIP to OpenMP hipMalloc and pass the pointer to OpenMP target kernel via `is_device_ptr` works. OpenMP to HIP `omp_target_alloc` and pass the pointer to HIP API/kernel print error. For example,...
It seems that offload plugin or down to hsa creates additional thread to communicate with the GPU. Such additional thread seems floating around and competing with regular OpenMP threads. It...
On my apu laptop when graphic memory is set low, the memory allocation failure caused a deadlock in the device plugin. ``` [/home/estewart/git/aomp11/amd-llvm-project/openmp/libomptarget/plugins/hsa/impl/data.cpp:99] atmi_malloc failed: HSA_STATUS_ERROR_INVALID_ALLOCATION ``` backtrace ``` __lll_lock_wait...
https://github.com/ye-luo/openmp-target/blob/master/hands-on/tests/complex/complex.cpp ``` $ clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -march=native -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 complex.cpp lld: error: undefined symbol: __mulsc3 >>> referenced by /tmp/complex-gfx906-72c03e-gfx906-c2b83e.o:(__omp_offloading_10304_2920ae4__Z8test_mulIfSt7complexIfES1_EvT0_T1__l59) >>> referenced by /tmp/complex-gfx906-72c03e-gfx906-c2b83e.o:(__omp_offloading_10304_2920ae4__Z8test_mulIfSt7complexIfES1_EvT0_T1__l59) lld: error: undefined symbol: __divsc3 >>> referenced...
The source code I'm using has multiple offload regions in different member functions of a class. If I enable individual target region and comment the other target pragma Kernel 1...
Using 0.7-7. The AOMP linker works on more complicated miniQMC but failed in linking the following test case. https://github.com/ye-luo/openmp-target/tree/master/hands-on/tests/link_static_fat_bin ``` /usr/lib/aomp/bin/clang++ -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -c classA.cpp rm -f mylib.a...