HIP icon indicating copy to clipboard operation
HIP copied to clipboard

ThreadSanitizer: thread leak from HIP runtime

Open al42and opened this issue 2 years ago • 5 comments

Trying to run any app which uses HIP API with TSAN triggers a "thread leak" error at the end:

$ hipcc tsan.cpp -g -fsanitize=thread -o tsan && ./tsan
clang-15: warning: ignoring '-fsanitize=thread' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
Detected 2 devices
==================
WARNING: ThreadSanitizer: thread leak (pid=2176329)
  Thread T2 (tid=2176332, finished) created by main thread at:
    #0 pthread_create /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1022:3 (tsan+0x263213)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State>>, void (*)()) <null> (libstdc++.so.6+0xd70a8) (BuildId: c90e6603c7cdf84713cd445700a575d3ea446d9b)

SUMMARY: ThreadSanitizer: thread leak (/lib/x86_64-linux-gnu/libstdc++.so.6+0xd70a8) (BuildId: c90e6603c7cdf84713cd445700a575d3ea446d9b) in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State>>, void (*)())
==================
ThreadSanitizer: reported 1 warnings

Tested with ROCm 5.4.1 on MI50 and ROCm 5.4.2 on RX 6400.

Code used (anything doing HIP API calls should work):

#include "hip/hip_runtime.h"
#include <iostream>

int main() {
  int n;
  auto err = hipGetDeviceCount(&n);
  std::cout << "Detected " << n << " devices\n";
  return 0;
}

al42and avatar Mar 13 '23 15:03 al42and

Thanks for reporting, will look into it.

jatinx avatar Mar 14 '23 23:03 jatinx

@al42and Apologies for the lack of response. Can you please test with latest ROCm 6.0.2 (HIP 6.0.32831)? If resolved, please close ticket. Thanks!

ppanchad-amd avatar Apr 11 '24 14:04 ppanchad-amd

Don't have 6.0.2 at hand, but the problem still occurs with 6.0.0:

$ hipcc tsan.cpp -g -fsanitize=thread -o tsan && ./tsan
clang: warning: ignoring '-fsanitize=thread' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
Detected 1 devices
==================
WARNING: ThreadSanitizer: thread leak (pid=3481992)
  Thread T2 (tid=3482001, finished) created by main thread at:
    #0 pthread_create /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1048 (tsan+0x296503)
    #1 <null> <null> (libhsa-runtime64.so.1+0x2972b) (BuildId: fdfae95418d176670b25ac26f0542b05d0aec181)
    #2 hipGetDeviceCount ??:? (libamdhip64.so.6+0xa9c23) (BuildId: c119a12e92604d9b1dd360dcf538793bfab296a4)
    #3 __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58 (libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)

SUMMARY: ThreadSanitizer: thread leak (/opt/rocm-6.0.0/lib/llvm/bin/../../../lib/libhsa-runtime64.so.1+0x2972b) (BuildId: fdfae95418d176670b25ac26f0542b05d0aec181) 
==================
ThreadSanitizer: reported 1 warnings

Note for others trying to reproduce: Since hipcc in ROCm 6.0 is based on Clang 17, it requires a workaround for TSAN on newer kernels: https://github.com/google/sanitizers/issues/1716#issuecomment-2010399341. But this is not directly related to the issue here.

al42and avatar Apr 11 '24 14:04 al42and

Still happens with 6.1:

$ hipcc --version
HIP version: 6.1.40092-038397aaa
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.1.1 24154 f53cd7e03908085f4932f7329464cd446426436a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.1.1/llvm/bin
Configuration file: /opt/rocm-6.1.1/lib/llvm/bin/clang++.cfg

$ hipcc tsan.cpp -g -fsanitize=thread -o tsan && ./tsan
clang: warning: ignoring '-fsanitize=thread' option as it is not currently supported for target 'amdgcn-amd-amdhsa' [-Woption-ignored]
Detected 1 devices
/usr/bin/addr2line: DWARF error: invalid or unhandled FORM value: 0x23
==================
WARNING: ThreadSanitizer: thread leak (pid=64226)
  Thread T2 (tid=64235, finished) created by main thread at:
    #0 pthread_create ??:? (tsan+0x29c90b)
    #1 <null> <null> (libhsa-runtime64.so.1+0x2c0fc) (BuildId: 8575df86329e78c19cac825f819d82b0361816da)
    #2 hipGetCmdName ??:? (libamdhip64.so.6+0xad053) (BuildId: daff87db3cceb0402dea325b66af7507d54d0eb2)
    #3 __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58 (libc.so.6+0x29d8f) (BuildId: 962015aa9d133c6cbcfb31ec300596d7f44d3348)

SUMMARY: ThreadSanitizer: thread leak ??:? in pthread_create
==================
ThreadSanitizer: reported 1 warnings

al42and avatar May 20 '24 15:05 al42and

@al42and We have an internal ticket to investigate this issue. Thanks!

ppanchad-amd avatar May 29 '24 13:05 ppanchad-amd

Hi @al42and,

I tried to reproduce the issue you are facing but could not find any threads that were leaking with the latest version of ROCm (6.2.2). I verified with threadSanitizer and gdb.

However, there was an issue with threadSanitizer where I got an error message with unexpected memory mapping. If you face a similar issue, there was a recent kernel update that bumped vm.mmap_rnd_bits up from 28 to 32 for amd64 systems. There was also an update to support only up to 30 ASLR bits for threadSanitizer: ThreadSanitizer ASLR Change. Therefore, to solve this issue, you would have to reduce ASLR bits from 32 to 30:

sudo sysctl vm.mmap_rnd_bits=30

Please give that a try on the latest version of ROCm and let me know if the issue persists, thanks!

darren-amd avatar Oct 17 '24 14:10 darren-amd

Hi @darren-amd

I tried to reproduce the issue you are facing but could not find any threads that were leaking with the latest version of ROCm (6.2.2). I verified with threadSanitizer and gdb.

Thanks. I can confirm that the issue can no longer be reproduced with 6.2.2 while still happening on the same machine with 6.1.1.

However, there was an issue with threadSanitizer where I got an error message with unexpected memory mapping. If you face a similar issue, there was a recent kernel update that bumped vm.mmap_rnd_bits up from 28 to 32 for amd64 systems. There was also an update to support only up to 30 ASLR bits for threadSanitizer: ThreadSanitizer ASLR Change. Therefore, to solve this issue, you would have to reduce ASLR bits from 32 to 30:

Yes, I'm aware of that, see the note in https://github.com/ROCm/HIP/issues/3182#issuecomment-2049857758.

al42and avatar Oct 17 '24 16:10 al42and