charm icon indicating copy to clipboard operation
charm copied to clipboard

Cleanup #1982: Use C++ STL data structures in conv-conds

Open evan-charmworks opened this issue 6 years ago • 1 comments

Original date: 2019-04-26 04:00:57 Original PR: https://charm.cs.illinois.edu/gerrit/4932


Change-Id: Ie533057d1dba8db277388affce245dc8d2cb50e6

evan-charmworks avatar May 08 '19 19:05 evan-charmworks

Performance seems to be equivalent to the original C implementation in microbenchmarks, but this PR is failing tests/charm/megatest with:

"------------- Processor 0 Exiting: Called CmiAbort ------------ Reason: Internal QD Error. Contact Developers.!"

I confirmed that this failure was happening before the timer optimizations commit.

stwhite91 avatar Dec 12 '21 03:12 stwhite91

Rebased, and it still fails in charm megatest, but now with a slightly better error message:

./charmrun +p1 ./megatest ++local
Charmrun> scalable start enabled. 
Charmrun> started all node programs in 0.163 seconds.
Charm++> Running in non-SMP mode: 1 processes (PEs)
Converse/Charm++ Commit ID: v7.1.0-devel-239-gae34c6343
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 hosts (1 sockets x 4 cores x 2 PUs = 8-way SMP)
Charm++> cpu topology info is gathered in 0.000 seconds.
CharmLB> Load balancing instrumentation for communication is off.
Megatest is running on 1 nodes 1 processors. 
test 0: initiated [groupring (milind)]
test 0: completed (0.00 sec)
test 1: initiated [nodering (milind)]
test 1: completed (0.00 sec)
test 2: initiated [varsizetest (mjlang)]
varsize: requires at least 2 processors
test 2: completed (0.00 sec)
test 3: initiated [varsizetest2 (phil)]
test 3: completed (0.00 sec)
test 4: initiated [varraystest (milind)]
varraystest: requires at least 2 processors
test 4: completed (0.00 sec)
test 5: initiated [groupcast (mjlang)]
test 5: completed (0.00 sec)
test 6: initiated [groupmulti (gengbin)]
test 6: completed (0.00 sec)
test 7: initiated [groupsectiontest (ebohm)]
groupsectiontest: requires at least 2 processors
test 7: completed (0.00 sec)
test 8: initiated [multisectiontest (ebohm)]
multisectiontest: requires at least 2 processors
test 8: completed (0.00 sec)
test 9: initiated [nodecast (milind)]
test 9: completed (0.00 sec)
test 10: initiated [synctest (mjlang)]
test 10: completed (0.00 sec)
test 11: initiated [fib (jackie)]
test 11: completed (0.00 sec)
test 12: initiated [arrayring (fang)]
test 12: completed (0.00 sec)
test 13: initiated [packtest (fang)]
test 13: completed (0.00 sec)
test 14: initiated [queens (jackie)]
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: [0] Assertion "state->getStage()==2" failed in file /Users/samwhite/Desktop/Sam/Illinois17-22/PPL/charm/src/ck-core/qd.C line 172.

[0] Stack Traceback:
  [0:0] megatest 0x100232cf1 CmiAbort
  [0:1] megatest 0x1002080b4 __CmiEnforceHelper
  [0:2] megatest 0x1001f84f2 _callWhenIdle(QdMsg*)
  [0:3] megatest 0x1001ff070 CcdRaiseCondition
  [0:4] megatest 0x1002060ff CsdScheduleForever
  [0:5] megatest 0x100205ef5 CsdScheduler
  [0:6] megatest 0x10023494c ConverseInit
  [0:7] megatest 0x1001f55de charm_main
  [0:8] megatest 0x100001034 start
  [0:9] 0x1

stwhite91 avatar Apr 25 '23 19:04 stwhite91

Fixed, it passes all tests now.

stwhite91 avatar Apr 25 '23 20:04 stwhite91

Fine-grained performance measured using Charm 1D array pingpong on 1 PE is unchanged (within the noise) with this PR from master.

stwhite91 avatar May 02 '23 03:05 stwhite91

Strange segfault in the Spack build. Unfortunately it doesn't show the most important part of the backtrace.

     22403      [1:15] bcast_subset_async_delayed 0x5555557717fd CsdScheduleForever
     22404      [1:16] bcast_subset_async_delayed 0x555555771a05 CsdScheduler
     22405      [1:17] bcast_subset_async_delayed 0x5555557bbdda
     22406      [1:18] bcast_subset_async_delayed 0x5555557bbe42
     22407      [1:19] libpthread.so.0 0x7ffff7f9d609
     22408      [1:20] libc.so.6 0x7ffff7b52133 clone
  >> 22409    Fatal error on PE 1> unknown signal
     22410    
     22411    real	0m0.230s
     22412    user	0m0.004s
     22413    sys	0m0.000s
  >> 22414    make[5]: *** [Makefile:48: smptest] Error 1
     22415    make[5]: Leaving directory '/tmp/runner/spack-stage/spack-stage-c
              harmpp-main-gqo6lhvz3f3e4gvegepesfz2szln437h/spack-src/tests/char
              m++/zerocopy/zc_post_async/bcast_subset_async'

evan-charmworks avatar May 04 '23 04:05 evan-charmworks

Looking at that ZC Post API Bcast test, it doesn't use any CcdCallbacks on its own (only what internal callbacks are used by the runtime). So it might be spurious.

stwhite91 avatar May 04 '23 18:05 stwhite91

The issue was actually a bug in my reimplementation. By default, std::priority_queue sorts in descending order, whereas we want to sort the time field in ascending order. I fixed that in the main commit. I also pushed other minor cleanup I made while debugging.

evan-charmworks avatar Jun 09 '23 05:06 evan-charmworks

This needs one more review

stwhite91 avatar Jun 12 '23 13:06 stwhite91