Cleanup #1982: Use C++ STL data structures in conv-conds
Original date: 2019-04-26 04:00:57 Original PR: https://charm.cs.illinois.edu/gerrit/4932
Change-Id: Ie533057d1dba8db277388affce245dc8d2cb50e6
Performance seems to be equivalent to the original C implementation in microbenchmarks, but this PR is failing tests/charm/megatest with:
"------------- Processor 0 Exiting: Called CmiAbort ------------ Reason: Internal QD Error. Contact Developers.!"
I confirmed that this failure was happening before the timer optimizations commit.
Rebased, and it still fails in charm megatest, but now with a slightly better error message:
./charmrun +p1 ./megatest ++local
Charmrun> scalable start enabled.
Charmrun> started all node programs in 0.163 seconds.
Charm++> Running in non-SMP mode: 1 processes (PEs)
Converse/Charm++ Commit ID: v7.1.0-devel-239-gae34c6343
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> Load balancing instrumentation for communication is off.
Megatest is running on 1 nodes 1 processors.
test 0: initiated [groupring (milind)]
test 0: completed (0.00 sec)
test 1: initiated [nodering (milind)]
test 1: completed (0.00 sec)
test 2: initiated [varsizetest (mjlang)]
varsize: requires at least 2 processors
test 2: completed (0.00 sec)
test 3: initiated [varsizetest2 (phil)]
test 3: completed (0.00 sec)
test 4: initiated [varraystest (milind)]
varraystest: requires at least 2 processors
test 4: completed (0.00 sec)
test 5: initiated [groupcast (mjlang)]
test 5: completed (0.00 sec)
test 6: initiated [groupmulti (gengbin)]
test 6: completed (0.00 sec)
test 7: initiated [groupsectiontest (ebohm)]
groupsectiontest: requires at least 2 processors
test 7: completed (0.00 sec)
test 8: initiated [multisectiontest (ebohm)]
multisectiontest: requires at least 2 processors
test 8: completed (0.00 sec)
test 9: initiated [nodecast (milind)]
test 9: completed (0.00 sec)
test 10: initiated [synctest (mjlang)]
test 10: completed (0.00 sec)
test 11: initiated [fib (jackie)]
test 11: completed (0.00 sec)
test 12: initiated [arrayring (fang)]
test 12: completed (0.00 sec)
test 13: initiated [packtest (fang)]
test 13: completed (0.00 sec)
test 14: initiated [queens (jackie)]
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: [0] Assertion "state->getStage()==2" failed in file /Users/samwhite/Desktop/Sam/Illinois17-22/PPL/charm/src/ck-core/qd.C line 172.
[0] Stack Traceback:
[0:0] megatest 0x100232cf1 CmiAbort
[0:1] megatest 0x1002080b4 __CmiEnforceHelper
[0:2] megatest 0x1001f84f2 _callWhenIdle(QdMsg*)
[0:3] megatest 0x1001ff070 CcdRaiseCondition
[0:4] megatest 0x1002060ff CsdScheduleForever
[0:5] megatest 0x100205ef5 CsdScheduler
[0:6] megatest 0x10023494c ConverseInit
[0:7] megatest 0x1001f55de charm_main
[0:8] megatest 0x100001034 start
[0:9] 0x1
Fixed, it passes all tests now.
Fine-grained performance measured using Charm 1D array pingpong on 1 PE is unchanged (within the noise) with this PR from master.
Strange segfault in the Spack build. Unfortunately it doesn't show the most important part of the backtrace.
22403 [1:15] bcast_subset_async_delayed 0x5555557717fd CsdScheduleForever
22404 [1:16] bcast_subset_async_delayed 0x555555771a05 CsdScheduler
22405 [1:17] bcast_subset_async_delayed 0x5555557bbdda
22406 [1:18] bcast_subset_async_delayed 0x5555557bbe42
22407 [1:19] libpthread.so.0 0x7ffff7f9d609
22408 [1:20] libc.so.6 0x7ffff7b52133 clone
>> 22409 Fatal error on PE 1> unknown signal
22410
22411 real 0m0.230s
22412 user 0m0.004s
22413 sys 0m0.000s
>> 22414 make[5]: *** [Makefile:48: smptest] Error 1
22415 make[5]: Leaving directory '/tmp/runner/spack-stage/spack-stage-c
harmpp-main-gqo6lhvz3f3e4gvegepesfz2szln437h/spack-src/tests/char
m++/zerocopy/zc_post_async/bcast_subset_async'
Looking at that ZC Post API Bcast test, it doesn't use any CcdCallbacks on its own (only what internal callbacks are used by the runtime). So it might be spurious.
The issue was actually a bug in my reimplementation. By default, std::priority_queue sorts in descending order, whereas we want to sort the time field in ascending order. I fixed that in the main commit. I also pushed other minor cleanup I made while debugging.
This needs one more review