Fathom icon indicating copy to clipboard operation
Fathom copied to clipboard

occasional crashes calling probe_wdl multithreaded

Open jdart1 opened this issue 9 years ago • 5 comments

I am seeing occasional crashing calling probe_wdl (supposedly thread-safe) in a multithreaded engine.

Stack trace follows (generated by GCC 6.2 with -fsanitize=address and -fsanitize=bounds) (Note: this is with my pending pull request applied):

==23941==ERROR: AddressSanitizer: SEGV on unknown address 0x7f9b9c815a02 (pc 0x0000004ec1b9 bp 0x7f99f9ac0bf0 sp 0x7f99f9ac0b80 T3) #0 0x4ec1b8 in decompress_pairs syzygy/tbcore.c:1500 #1 0x4f07f9 in probe_wdl_table syzygy/tbprobe.c:780 #2 0x4f689b in probe_ab syzygy/tbprobe.c:1402 #3 0x4f6b7b in probe_wdl syzygy/tbprobe.c:1420 #4 0x4f8d93 in tb_probe_wdl_impl syzygy/tbprobe.c:1852 #5 0x4dde56 in tb_probe_wdl syzygy/tbprobe.h:223 #6 0x4dec69 in SyzygyTb::probe_wdl(Board const&, int&, bool) /home/jdart/dev/arasan-chess/src/syzygy.cpp:119 #7 0x4a47d0 in Search::search() /home/jdart/dev/arasan-chess/src/search.cpp:2428 #8 0x4afe02 in Search::search(int, int, int, int, int) (/home/jdart/dev/arasan-chess/bin/arasanx-64-popcnt+0x4afe02) #9 0x4a722a in Search::search() /home/jdart/dev/arasan-chess/src/search.cpp:2848 #10 0x4afe02 in Search::search(int, int, int, int, int) (/home/jdart/dev/arasan-chess/bin/arasanx-64-popcnt+0x4afe02) #11 0x4a722a in Search::search() /home/jdart/dev/arasan-chess/src/search.cpp:2848 #12 0x4afe02 in Search::search(int, int, int, int, int) (/home/jdart/dev/arasan-chess/bin/arasanx-64-popcnt+0x4afe02) #13 0x4a722a in Search::search() /home/jdart/dev/arasan-chess/src/search.cpp:2848 #14 0x4afe02 in Search::search(int, int, int, int, int) (/home/jdart/dev/arasan-chess/bin/arasanx-64-popcnt+0x4afe02) #15 0x4a9dc1 in Search::searchSMP(ThreadInfo*) /home/jdart/dev/arasan-chess/src/search.cpp:3203 #16 0x4d059d in ThreadPool::idle_loop(ThreadInfo*, SplitPoint const*) /home/jdart/dev/arasan-chess/src/threadp.cpp:137 #17 0x4d0714 in parkingLot /home/jdart/dev/arasan-chess/src/threadp.cpp:162 #18 0x7f99fef6e6f9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76f9) #19 0x7f99fe99bb5c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x106b5c)

AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV syzygy/tbcore.c:1500 in decompress_pairs Thread T3 created by T0 here: #0 0x7f99ff53d558 in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.3+0x31558) #1 0x4d0a4e in ThreadInfo::ThreadInfo(ThreadPool*, int) /home/jdart/dev/arasan-chess/src/threadp.cpp:212 #2 0x4d0ccf in ThreadPool::ThreadPool(SearchController*, int) /home/jdart/dev/arasan-chess/src/threadp.cpp:252 #3 0x49244c in SearchController::SearchController() /home/jdart/dev/arasan-chess/src/search.cpp:179 #4 0x41c5be in main /home/jdart/dev/arasan-chess/src/arasanx.cpp:3694 #5 0x7f99fe8b582f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

jdart1 avatar Dec 19 '16 02:12 jdart1

Does it still crash when the sanitizers are disabled?

basil00 avatar Dec 21 '16 00:12 basil00

Yes. But it is infrequent. Once every dozen or so long time-control games. With tablebases disabled I see no crashes. Btw. I notice Ronald de Man's code at https://github.com/syzygy1/tb/tree/master/src has made some decompress fixes recently. But it is not immediately clear to me how to apply to Fathom.

jdart1 avatar Dec 21 '16 04:12 jdart1

The crash occurs in tbcore.c, which is pretty much unchanged from Ronald's version. I am also not at all familiar with this code.

Once every dozen or so long time-control games.

That is quite frequent, so it is unusual that it has not been noticed before.

So I am really not sure since I can't reproduce the problem. I have tested 1000s of games with Fathom and Gull and did not observe any crashes.

basil00 avatar Dec 24 '16 15:12 basil00

This appears to be fixed by making variable "ready" atomic (but this only works for C++). See

https://github.com/jdart1/Fathom/commit/64685b54da02f36676e4d6a4a503b95b42fc711c

jdart1 avatar May 28 '17 17:05 jdart1

Perhaps this is worth reporting to Ronald?

basil00 avatar May 30 '17 13:05 basil00