clEnqueueSVMMemcpy SegFault
Running clEnqueueSVMMemcpy(queue, CL_TRUE, dst, src, size, 0, NULL, NULL); with dst allocated with clSVMAlloc and src allocated by the system (e.g. posix_memalign) triggers a segmentation fault:
Thread 4 "Command Queue T" received signal SIGSEGV, Segmentation fault.
Backtrace:
#0 0x00007fffebed61b4 in amd::SharedReference<amd::Context>::operator() (this=0x68) at /space/rocm/ROCclr/platform/object.hpp:166
#1 0x00007fffebed1f1e in amd::Memory::getContext (this=0x0) at /space/rocm/ROCclr/platform/memory.hpp:302
#2 0x00007fffebfeb8e3 in roc::NullDevice::forceFineGrain (this=0x55555568f820, memory=0x0) at /space/rocm/ROCclr/device/rocm/rocdevice.hpp:194
#3 0x00007fffebfe04e0 in roc::VirtualGPU::submitSvmCopyMemory (this=0x7ffed8000b90, cmd=...) at /space/rocm/ROCclr/device/rocm/rocvirtual.cpp:1281
#4 0x00007fffebf00cd0 in amd::SvmCopyMemoryCommand::submit(device::VirtualDevice&) () from /space/rocm/ROCm-OpenCL-Runtime/build/lib/libamdocl64.so
#5 0x00007fffebf8f4e2 in amd::HostQueue::loop (this=0x5555555d1ce0, virtualDevice=0x7ffed8000b90) at /space/rocm/ROCclr/platform/commandqueue.cpp:167
#6 0x00007fffebf9251e in amd::HostQueue::Thread::run (this=0x5555555d1d88, data=0x5555555d1ce0) at /space/rocm/ROCclr/platform/commandqueue.hpp:161
#7 0x00007fffebf4a4bd in amd::Thread::main (this=0x5555555d1d88) at /space/rocm/ROCclr/thread/thread.cpp:93
#8 0x00007fffebf984a4 in amd::Thread::entry (thread=0x5555555d1d88) at /space/rocm/ROCclr/os/os_posix.cpp:318
#9 0x00007ffff6bc2609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#10 0x00007ffff71cb103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The problem are the checks in line 1281 and 1282, which will both trigger a segfault if srcMem or dstMem is a nullptr, that is, not present in the memory map (line 1259 and 1260):
https://github.com/ROCm-Developer-Tools/ROCclr/blob/roc-3.5.x/device/rocm/rocvirtual.cpp#L1281-L1282
https://github.com/ROCm-Developer-Tools/ROCclr/blob/roc-3.5.x/device/rocm/rocvirtual.cpp#L1259-L1260
This will call forceFineGrain with a nullptr instead of amd::Memory*:
https://github.com/ROCm-Developer-Tools/ROCclr/blob/roc-3.5.x/device/rocm/rocdevice.hpp#L193
Tested on lates ROCm 3.5.1 release with the coresponding roc-3.5.x or rocm-3.5.x branches.
This still happens with the latest ROCm 3.9.0 release.
The same bug is triggered by test_svm in the Khronos OpenCL Conformance Tests:
./test_conformance/SVM/test_svm
...
Compute Device Name = gfx1010, Compute Device Vendor = Advanced Micro Devices, Inc., Compute Device Version = OpenCL 2.0 , CL C Version = OpenCL C 2.0
...
svm_enqueue_api...
clEnqueueSVMMemcpy case: src_alloc = host, dst_alloc = host
clEnqueueSVMMemcpy case: src_alloc = host, dst_alloc = svm
Segmentation fault (core dumped)
clEnqueueSVMMemcpy crashes also with ROCm 5.1.3. Both in OpenCL-CTS (in the same spot as outlined above) and a standalone test program that only calls this function.
I believe ROCM5.2 should have a fix for this issue.
Tested W5700 with rocm-5.2.1 and W6600 with rocm-5.2.3 on different machines: segfault does not occur anymore but OpenCL-CTS SVM test fails on both machines with:
svm_enqueue_api...
clEnqueueSVMMemcpy case: src_alloc = host, dst_alloc = host
clEnqueueSVMMemcpy case: src_alloc = host, dst_alloc = svm
Invalid data at index 0, dst_ptr 99, src_ptr 53
svm_enqueue_api FAILED