hsa_amd_pointer_info doesn't work when calling from a function?
On my machine, I got hsa_amd_pointer_info failed.
If I change if(0) to if(1), the call succeeds.
reproducer.
#include <hsa/hsa.h>
#include <hsa/hsa_ext_amd.h>
#include <stdio.h>
#define N 100293
int checkLocked(void *ptr) {
hsa_amd_pointer_info_t info;
hsa_status_t herr = HSA_STATUS_SUCCESS;
herr = hsa_amd_pointer_info(ptr, &info, NULL, NULL, NULL);
if (herr != HSA_STATUS_SUCCESS) {
printf(" hsa_amd_pointer_info failed %d\n", herr);
return 1;
}
if (info.type != HSA_EXT_POINTER_TYPE_LOCKED) {
printf(" pointer is noooooooooooot locked\n");
return 1;
} else
printf(" pointer is locked func\n");
return 0;
}
int main() {
hsa_status_t herr = hsa_init();
if (herr != HSA_STATUS_SUCCESS) {
printf("hsa_init failed\n");
return 1;
}
const int n = N;
int *a = new int[n];
for (int i = 0; i < n; i++)
a[i] = 0;
checkLocked(a);
int *a_locked = nullptr;
herr = hsa_amd_memory_lock(a, n * sizeof(int), nullptr, 0, (void **)&a_locked);
if (herr != HSA_STATUS_SUCCESS) {
printf("Locking failed\n");
return 1;
}
if(0)
{
hsa_amd_pointer_info_t info;
herr = hsa_amd_pointer_info(a, &info, NULL, NULL, NULL);
if (herr != HSA_STATUS_SUCCESS) {
printf(" hsa_amd_pointer_info failed\n");
return 1;
}
if (info.type != HSA_EXT_POINTER_TYPE_LOCKED) {
printf(" pointer is noooooooooooot locked\n");
return 1;
} else
printf(" pointer is locked main\n");
}
else
checkLocked(a);
hsa_shut_down();
return herr;
}
That's weird. I see the void* vs int* type on the pointer, and the indirection through a function call, but hsa_amd_pointer_info is in a shared library (so abi is fixed), can't imagine why this would matter. Will see if it reproduces locally.
Don't think it does. I appended 'func' and 'main' to the print statements to distinguish them,
export LLVM=$HOME/llvm-install ; clang++ -O1 pointer_info.cpp -I$LLVM/include/ -Wl,--rpath=$LLVM/lib/ $LLVM/lib/libhsa-runtime64.so.1 && ./a.out
pointer is noooooooooooot locked func
pointer is locked func
export LLVM=$HOME/llvm-install ; clang++ -O1 pointer_info.cpp -I$LLVM/include/ -Wl,--rpath=$LLVM/lib/ $LLVM/lib/libhsa-runtime64.so.1 && ./a.out
pointer is noooooooooooot locked func
pointer is locked main
(my local machine has a gfx906, rocm 4.5, linux 5.4, driver from rocm as opposed to upstream)
I updated a bit the above code
yeluo@epyc-server:~$ clang++ -fsanitize=address -fopenmp -O0 test_hsa.cpp -I/opt/rocm/include -L/opt/rocm/lib -lhsa-runtime64 && ./a.out
pointer is noooooooooooot locked
pointer is locked func
yeluo@epyc-server:~$ clang++ -O0 -I/opt/rocm-4.5.2/include test_hsa.cpp -L/opt/rocm-4.5.2/lib -lhsa-runtime64 && ./a.out
pointer is noooooooooooot locked
hsa_amd_pointer_info failed 4097
yeluo@epyc-server:~$ clang++ -O1 -I/opt/rocm-4.5.2/include test_hsa.cpp -L/opt/rocm-4.5.2/lib -lhsa-runtime64 && ./a.out
pointer is noooooooooooot locked
pointer is locked func
yeluo@epyc-server:~$ clang++ -O3 -I/opt/rocm-4.5.2/include test_hsa.cpp -L/opt/rocm-4.5.2/lib -lhsa-runtime64 && ./a.out
pointer is noooooooooooot locked
pointer is locked func
So It seems to be a real issue at "-O0". So either I misused the library or a bug.
What's clang++ in this context? Different behaviour on O0 and O1 is a bug and that's relatively unusual on x64
upstream clang and AOMP clang don't matter. g++ constantly failed.
yeluo@epyc-server:~$ g++ -O0 -I/opt/rocm-4.5.2/include test_hsa.cpp -L/opt/rocm-4.5.2/lib -lhsa-runtime64 && ./a.out
pointer is noooooooooooot locked
hsa_amd_pointer_info failed 4097
yeluo@epyc-server:~$ g++ -O1 -I/opt/rocm-4.5.2/include test_hsa.cpp -L/opt/rocm-4.5.2/lib -lhsa-runtime64 && ./a.out
pointer is noooooooooooot locked
hsa_amd_pointer_info failed 4097
yeluo@epyc-server:~$ g++ -O3 -I/opt/rocm-4.5.2/include test_hsa.cpp -L/opt/rocm-4.5.2/lib -lhsa-runtime64 && ./a.out
pointer is noooooooooooot locked
hsa_amd_pointer_info failed 4097
What if you change if(0) to if(!a_locked)? That way the compiler would be forced to run hsa_amd_memory_lock before checkLocked(a). Without that, it doesn't know that the checkLocked call depends on hsa_amd_memory_lock.
What if you change if(0) to if(!a_locked)? That way the compiler would be forced to run hsa_amd_memory_lock before checkLocked(a). Without that, it doesn't know that the checkLocked call depends on hsa_amd_memory_lock.
It doesn't change the above behavior.
hsa_amd_memory_lock and hsa_amd_pointer_info are both calls to unknown functions in a shared library - the C++ compiler has to pessimistically assume they might mutate the same global state so can't reorder them. Though different behaviour on O0 and O1 does support the x64 compiler bug theory. Sadly this still doesn't reproduce for me, not sure where the config difference lies