Inconsistent function name in RECURSIVE
Describe the bug
The first issue: DPLASMA_WITH_RECURSIVE and PARSEC_HAVE_RECURSIVE are independent; which means PARSEC_HAVE_RECURSIVE is always OFF by default even if DPLASMA_WITH_RECURSIVE is ON.
The second issue: if PARSEC_HAVE_RECURSIVE is enabled manually, DPLASMA can not compile with the following error:
/home/qcao3/dplasma/build/src/dpotrf_L.c:3960:32: error: 'hook_of_dpotrf_L_potrf_dgemm_recursive' undeclared here (not in a function); did you mean 'hook_of_dpotrf_L_potrf_dgemm_RECURSIVE'?
3960 | .hook = (parsec_hook_t *) hook_of_dpotrf_L_potrf_dgemm_recursive},
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| hook_of_dpotrf_L_potrf_dgemm_RECURSIVE
I guess it’s because of this commit
To Reproduce
Steps to reproduce the behavior:
- Manually enable
PARSEC_HAVE_RECURSIVE - Compile DPLASMA
Expected behavior
Compilation failure
Environment (please complete the following information):
Saturn ICL
Additional context
Add any other context about the problem here.
The content of the config.log file can be useful in some cases.
Made a simple fix for this (maybe it's not good enough).
DPLASMA:
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 908d3f0..69f8386 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -92,6 +92,10 @@ option(DPLASMA_SCALAPACK_WRAPPER
option(DPLASMA_WITH_RECURSIVE
"Enable recursive kernels to be called when available" ON)
+if(DPLASMA_WITH_RECURSIVE)
+ add_definitions(-DPARSEC_HAVE_RECURSIVE)
+ message(STATUS "PARSEC_HAVE_RECURSIVE is defined.")
+endif(DPLASMA_WITH_RECURSIVE)
option(DPLASMA_TRACE_KERNELS
"Enable tracing dplasma kernel calls for debugging (slow)" OFF)
PaRSEC:
diff --git a/parsec/interfaces/ptg/ptg-compiler/jdf2c.c b/parsec/interfaces/ptg/ptg-compiler/jdf2c.c
index 1cf9494..651ca68 100644
--- a/parsec/interfaces/ptg/ptg-compiler/jdf2c.c
+++ b/parsec/interfaces/ptg/ptg-compiler/jdf2c.c
@@ -3941,7 +3941,11 @@ jdf_generate_function_incarnation_list( const jdf_t *jdf,
string_arena_add_string(sa, " .dyld = \"%s\",\n", dyld_property->expr->jdf_var);
}
string_arena_add_string(sa, " .evaluate = %s,\n", "NULL");
- string_arena_add_string(sa, " .hook = (parsec_hook_t*)hook_of_%s_%s },\n", base_name
+ if (!strcmp(dev_upper, "RECURSIVE")) {
+ string_arena_add_string(sa, " .hook = (parsec_hook_t*)hook_of_%s_%s },\n", base_
+ } else {
+ string_arena_add_string(sa, " .hook = (parsec_hook_t*)hook_of_%s_%s },\n", base_
+ }
string_arena_add_string(sa, "#endif /* defined(PARSEC_HAVE_%s) */\n", dev_upper);
free(dev_upper); free(dev_lower);
}
The compilation issue is fixed. However, there are still issues when recursive is enabled. Here are infos from gdb:
#0 0x00007ffff5aa9387 in raise () from /lib64/libc.so.6
#1 0x00007ffff5aaaa78 in abort () from /lib64/libc.so.6
#2 0x00007ffff5aa21a6 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff5aa2252 in __assert_fail () from /lib64/libc.so.6
#4 0x00007ffff6c78f60 in parsec_taskpool_unregister (tp=0x7fff40005ce0) at /home/qcao3/dplasma/parsec/parsec/parsec.c:2165
#5 0x00007ffff736a1ee in __parsec_dpotrf_U_internal_destructor (__parsec_tp=0x7fff40005ce0) at /home/qcao3/dplasma/build/src/dpotrf_U.c:9797
#6 0x00007ffff6c744c9 in parsec_obj_run_destructors (object=0x7fff40005ce0) at /home/qcao3/dplasma/parsec/parsec/class/parsec_object.h:446
#7 0x00007ffff6c78ff8 in parsec_taskpool_free (tp=0x7fff40005ce0) at /home/qcao3/dplasma/parsec/parsec/parsec.c:2173
#8 0x00007ffff709c770 in dplasma_dpotrf_Destruct (tp=0x7fff40005ce0) at /home/qcao3/dplasma/build/src/dpotrf_wrapper.c:263
#9 0x00007ffff736adf1 in dpotrf_U_update_INFO (_tp=0x7fff40005ce0, data=0x7fff40008d20) at /home/qcao3/dplasma/build/src/dpotrf_U.c:10120
#10 0x00007ffff7356949 in parsec_recursivecall_callback (tp=0x7fff40005ce0, cb_data=0x7fff40008d20) at /home/qcao3/dplasma/parsec/parsec/recursive.h:32
#11 0x00007ffff6c8afe1 in parsec_taskpool_termination_detected (tp=0x7fff40005ce0) at /home/qcao3/dplasma/parsec/parsec/scheduling.c:229
#12 0x00007ffff6cba26f in parsec_termdet_local_termination_detected (tp=0x7fff40005ce0) at /home/qcao3/dplasma/parsec/parsec/mca/termdet/local/termdet_local_module.c:120
#13 0x00007ffff6cba65c in parsec_termdet_local_taskpool_addto_nb_tasks (tp=0x7fff40005ce0, v=-1) at /home/qcao3/dplasma/parsec/parsec/mca/termdet/local/termdet_local_module.c:206
#14 0x00007ffff6c960e1 in parsec_release_task_to_mempool_update_nbtasks (es=0x7fff54000950, this_task=0x7fff5400aa20) at /home/qcao3/dplasma/parsec/parsec/interfaces/interface.c:37
#15 0x00007ffff7369067 in release_task_of_dpotrf_U_potrf_dpotrf (es=0x7fff54000950, this_task=0x7fff5400aa20) at /home/qcao3/dplasma/build/src/dpotrf_U.c:9421
#16 0x00007ffff6c8b68e in __parsec_complete_execution (es=0x7fff54000950, task=0x7fff5400aa20) at /home/qcao3/dplasma/parsec/parsec/scheduling.c:470
#17 0x00007ffff6c8b7b1 in __parsec_task_progress (es=0x7fff54000950, task=0x7fff5400aa20, distance=1) at /home/qcao3/dplasma/parsec/parsec/scheduling.c:497
#18 0x00007ffff6c8c25d in __parsec_context_wait (es=0x7fff54000950) at /home/qcao3/dplasma/parsec/parsec/scheduling.c:776
#19 0x00007ffff6c750c0 in __parsec_thread_init (startup=0xaec450) at /home/qcao3/dplasma/parsec/parsec/parsec.c:337
#20 0x00007fffaf454ea5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007ffff5b719fd in clone () from /lib64/libc.so.6
>>> f 4
#4 0x00007ffff6c78f60 in parsec_taskpool_unregister (tp=0x7fff40005ce0) at /home/qcao3/dplasma/parsec/parsec/parsec.c:2165
2165 assert( PARSEC_TERM_TP_TERMINATED == tp->tdm.module->taskpool_state(tp) );
>>> p tp->tdm.module->taskpool_state(tp)
$1 = PARSEC_TERM_TP_BUSY
Testing command in DPLASMA:
./testing_dpotrf -N 1000 -t 100 -z 50
@therault suggests this patch:
diff --git a/parsec/mca/termdet/local/termdet_local_module.c b/parsec/mca/termdet/local/termdet_local_module.c
index d599cd2ea..5e37910bc 100644
--- a/parsec/mca/termdet/local/termdet_local_module.c
+++ b/parsec/mca/termdet/local/termdet_local_module.c
@@ -88,7 +88,7 @@ static void parsec_termdet_local_monitor_taskpool(parsec_taskpool_t *tp,
static void parsec_termdet_local_unmonitor_taskpool(parsec_taskpool_t *tp)
{
assert(&parsec_termdet_local_module.module == tp->tdm.module);
- assert(tp->tdm.monitor == PARSEC_TERMDET_LOCAL_TERMINATED);
+ assert(tp->tdm.monitor == PARSEC_TERMDET_LOCAL_TERMINATED || tp->tdm.monitor == PARSEC_TERMDET_LOCAL_TERMINATING);
tp->tdm.module = NULL;
tp->tdm.callback = NULL;
}
@@ -98,9 +98,9 @@ static parsec_termdet_taskpool_state_t parsec_termdet_local_taskpool_state(parse
if( tp->tdm.module == NULL )
return PARSEC_TERM_TP_NOT_MONITORED;
assert(tp->tdm.module == &parsec_termdet_local_module.module);
- if( tp->tdm.monitor == PARSEC_TERMDET_LOCAL_TERMINATED )
+ if( tp->tdm.monitor == PARSEC_TERMDET_LOCAL_TERMINATED || tp->tdm.monitor == PARSEC_TERMDET_LOCAL_TERMINATING)
return PARSEC_TERM_TP_TERMINATED;
- if( tp->tdm.monitor == PARSEC_TERMDET_LOCAL_BUSY || tp->tdm.monitor == PARSEC_TERMDET_LOCAL_TERMINATING )
+ if( tp->tdm.monitor == PARSEC_TERMDET_LOCAL_BUSY )
return PARSEC_TERM_TP_BUSY;
if( tp->tdm.monitor == PARSEC_TERMDET_LOCAL_NOT_READY )
return PARSEC_TERM_TP_NOT_READY;
After this patch, there is still an issue.
#0 0x00007ffff6c8a7f7 in parsec_atomic_fetch_add_int32 (l=0xb, v=-1) at /home/qcao3/dplasma/parsec/parsec/include/parsec/sys/atomic-c11.h:163
#1 0x00007ffff6c8a893 in parsec_atomic_fetch_dec_int32 (l=0xb) at /home/qcao3/dplasma/parsec/parsec/include/parsec/sys/atomic.h:160
#2 0x00007ffff6c8aff5 in parsec_taskpool_termination_detected (tp=0x7fff30004740) at /home/qcao3/dplasma/parsec/parsec/scheduling.c:231
#3 0x00007ffff6cba27d in parsec_termdet_local_termination_detected (tp=0x7fff30004740) at /home/qcao3/dplasma/parsec/parsec/mca/termdet/local/termdet_local_module.c:120
#4 0x00007ffff6cba71c in parsec_termdet_local_taskpool_addto_runtime_actions (tp=0x7fff30004740, v=-1) at /home/qcao3/dplasma/parsec/parsec/mca/termdet/local/termdet_local_module.c:223
#5 0x00007ffff6c8af83 in parsec_taskpool_update_runtime_nbtask (tp=0x7fff30004740, nb_tasks=-1) at /home/qcao3/dplasma/parsec/parsec/scheduling.c:217
#6 0x00007ffff6c8545f in remote_dep_dec_flying_messages (handle=0x7fff30004740) at /home/qcao3/dplasma/parsec/parsec/remote_dep.h:364
#7 0x00007ffff6c89c89 in remote_dep_mpi_new_taskpool (es=0xaced70, dep_cmd_item=0x7fff30024ab0) at /home/qcao3/dplasma/parsec/parsec/remote_dep_mpi.c:1934
#8 0x00007ffff6c881ef in remote_dep_dequeue_nothread_progress (es=0xaced70, cycles=0) at /home/qcao3/dplasma/parsec/parsec/remote_dep_mpi.c:1170
#9 0x00007ffff6c807ee in parsec_remote_dep_progress (es=0xaced70) at /home/qcao3/dplasma/parsec/parsec/remote_dep.c:327
#10 0x00007ffff6c8c1e9 in __parsec_context_wait (es=0xaced70) at /home/qcao3/dplasma/parsec/parsec/scheduling.c:760
#11 0x00007ffff6c8c836 in parsec_context_wait (context=0xab26c0) at /home/qcao3/dplasma/parsec/parsec/scheduling.c:977
#12 0x0000000000402ae8 in main (argc=7, argv=0x7fffffff6c18) at /home/qcao3/dplasma/build/tests/testing_dpotrf.c:62
The latest PR related to the introduced new issue is: Introduce parsec_taskpool_wait and parsec_taskpool_test by @devreal
The root of the problem is that in recursive, we try to parsec_taskpool_destroy(tp) in the callback of the termination detector for tp. To allow this behvior, we should terminate (and mark terminated) the taskpool before calling the callback.
However, when we use the termination detector for DTD (or for other DSLs like TTG), if we mark the taskpool terminated before calling the callback, we create a race condition: another thread, that is polling the status of the taskpool, can detect it terminated before the callback is called, and move on to the next phase that re-initializes the taskpool to be progressed again, leading to other problems when the callback is finally called (this is the reason why we introduced the status of TERMINATING for a taskpool: everyone agrees termination is reached, but the taskpool still exists).
The same callback cannot be used for both behaviors.
One solution (maybe): instead of calling parsec_taskpool_destroy(tp) in the callback, the termination detector should PARSEC_OBJ_RETAIN(tp) when starting to monitor the taskpool, the recursive code should call PARSEC_OBJ_RELEASE(tp), and the termination detector would also PARSEC_OBJ_RELEASE(tp) after the callback is called to trigger the destruction? @bosilca do you see a better way?
We were careful not to touch the taskpool on the way out of the callback, but now we need to go back into the termdet who might still have a pointer to it. Possible but tedious.
If we mark the taskpool before calling the callback but we remember that we did it, then the caller (either the callback itself or the main thread) can handle the taskpool as they want/need. Of course they could do bad things, such as the main thread frees the taskpool while we are still calling the callback but that's of little concerns, easily fixed with correct programming.
We might need to add a dual mechanism as the one we use in OMPI for handling requests, they can be internally completed or externally completed, allowing us a window between the two to handle object related things (like calling the completion callback)