How to Reproduce Scaling Results
I am working on reproducing some scaling results on t8code for a JOSS review: https://github.com/openjournals/joss-reviews/issues/6887.
I noticed the benchmarks directory, including the ExtremeScaling directory with the bunny example. My first question is this: is that a good "out of the box" problem for me to test the scaling of t8code with? If so, I have some questions below on how to run it. If not, what examples would be good candidates to verify the scaling?
For the bunny example, I am running into issues reading in a tetgen file. I have never worked with them, so my issue could simply be lack of experience. I have the bunny executable built. In order to run it, I need a tetgen file of the bunny mesh. I got the Stanford bunny mesh data from here. I copied the file into the benchmarks/ExtremeScaling directory and ran the t8_bunny example but to no avail.
Steps to reproduce
I am running this on a MacBook Pro (2021). Once I have it working, I'll repeat the process on the cluster I have access to. Each node of the cluster I will be running on has a 32 core AMD CPU and 4 NVIDIA GPUs. I will only be using the CPUs for this test.
-
Download Stanford bunny zipped file from here and unzip into
${downloads}. -
Copy the
bun_zipper.plyinto theExtremeScalingdirectory oft8codesource (and rename tobunny):
cd ${t8code-source}/benchmarks/ExtremeScaling
cp ${downloads}/bunny/reconstruction/bun_zipper.ply ./
mv bun_zipper.ply bunny.ply
- Run
bunnybenchmark:
mpirun -n 8 ./t8_bunny bunny
Output from above
>>> ./t8_bunny bunny
[libsc] This is libsc 2.8.5.406-2b20
[libsc] CPP
[libsc] CPPFLAGS
[libsc] CC mpicc
[libsc] CFLAGS -g -O2
[libsc] LDFLAGS
[libsc] LIBS -lz
[p4est] This is p4est 2.8.6.23-7896
[p4est] CPP
[p4est] CPPFLAGS
[p4est] CC mpicc
[p4est] CFLAGS -g -O2
[p4est] LDFLAGS
[p4est] LIBS -lz
[t8] This is t8 2.0.0.396-758c
[t8] CPP
[t8] CPPFLAGS
[t8] CC mpicc
[t8] CFLAGS -g -O2
[t8] LDFLAGS
[t8] LIBS -lz -lstdc++
[p4est 0] Failed to open bunny.node
[p4est 0] Failed to read nodes for bunny
[libsc 0] Abort: Failed to read tetgen bunny
[libsc 0] Abort: <unknown>:0
[libsc 0] Abort: Obtained 7 stack frames
[libsc 0] Stack 0: 0 libsc.2.dylib 0x0000000102766278 sc_abort_handler + 96
[libsc 0] Stack 1: 1 libsc.2.dylib 0x0000000102766384 sc_abort + 20
[libsc 0] Stack 2: 2 libsc.2.dylib 0x0000000102765d88 sc_int_compare + 0
[libsc 0] Stack 3: 3 libsc.2.dylib 0x00000001027663dc sc_abort_collective + 0
[libsc 0] Stack 4: 4 libsc.2.dylib 0x00000001027671a4 SC_GEN_LOGF + 0
[libsc 0] Stack 5: 5 t8_bunny 0x00000001022cf81c main + 256
[libsc 0] Stack 6: 6 dyld 0x0000000189265058 start + 2224
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
Proc: [[46390,0],0]
Errorcode: 1
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
The issue is the bunny.ply file not being the actual file I need. I appear to need a bunny.node file. How can I either generate that file from the Stanford bunny data or get that file from elsewhere?
Hi @DamynChipman sorry for the late reply, it is holiday season and most of us are currently away.
To be honest, the bunny example is 8+ years old and relies on tetgen which we no longer support. So after seeing your issue we decided to remove it from the code base.
I understand that you want to have some benchmark program that you can run in order to see some scaling results for our core functionality?
If yes, then i suggest the program ./t8_time_forest_partition from the benchmarks folder.
We use it in several papers, and it is described particularly in https://arxiv.org/pdf/1910.10641 Section 5.2.
In this example we build a mesh geometry, uniformly refine it to a given level l, and then adaptively refine a band of elements, we move this band through the mesh over several time steps.
This example measures Adapt Partition Ghost and Balance, so all of our critical core algorithms.
I recommend using a call like:
mpirun -np N ./t8_time_forest_partition -g -b -C 0.8 -x -0.4 -X -0.3 -l4 -r3 -O -o -T0.05
This will build a 1 million element mesh, you can increase the -l value if thats to small and runs to quickly (on my machine its ca. 5 seconds on 1 MPI rank).
Here is an overview of the options:
| Option | Description | recommendation |
|---|---|---|
| -g | Build Ghost layer | don't change |
| -b | 2:1 balance the mesh | don't change |
| -C | CFL number/how fast the mesh moves | don't change |
| -x | Where the band of fine elements starts | decrease for more fine elements |
| -X | Where the band of fine elements stops | increase for more fine elements |
| -l | Uniform refinement level | every step multiplies number of elements by 8 |
| -r | How many refinement levels from the uniform level | change at will |
| -O | Use cylindrical geometry | don't change |
| -o | Do not produce VTK output | Keep it while measuring runtime. For debugging/checking the mesh, leave it out. |
| -T | Simulation end time | Divide by two for each additional uniform level. |
Here is a low dimensional example ("-l 2 -r 3") for the mesh that is created:
Does this help?
Please let me know if you have further questions.
Yeah, this is super helpful, thanks! I will work through this and share results/issues along the way.
Just to confirm, this example requires OpenCascade, correct? After compiling, I ran
mpirun -np N ./t8_time_forest_partition -g -b -C 0.8 -x -0.4 -X -0.3 -l4 -r3 -O -o -T0.05
but it stopped and said that example requires OpenCascade. I have been trying to install OpenCascade on my cluster but keep running into system issues not related to t8code.
I suggest to use -L instead of -O even though it is "not recommended" in the table above. This uses a cylinder with linear geometry (linear elements). This does not require t8code to be linked against OpenCascade.
I want to double check to make sure it is running properly locally before moving to a cluster. The benchmark runs to completion and I can mess around with the parameters to increase/decrease the number of elements and view the mesh. However, I don't know if the results are being reported properly. I have used sc and p4est in the past, including the timing/stats functions. It appears that timing for adapt, ghost, partition, and balance are not being accumulated.
I am working on the August 1st commit (https://github.com/DLR-AMR/t8code/commit/62128c743a136a34bd05784597f2065d93a638e4) to avoid a compilation issue reported here (https://github.com/DLR-AMR/t8code/issues/1240). There are no functional differences between main and this commit for benchmarks/time_forest_partition.cxx.
Here's the output of the following:
>>> mpirun -np 8 ./t8_time_forest_partition -g -b -C 0.8 -x -0.4 -X -0.3 -l 4 -r 4 -L -o -T 0.025
[libsc] This is libsc 2.8.5.999
[p4est] This is p4est 2.8.6.999
[t8] This is t8 2.0.0
[t8] CXX /opt/homebrew/bin/mpicxx
[t8] CXXFLAGS -O3 -DNDEBUG
[t8] CC /opt/homebrew/bin/mpicc
[t8] CFLAGS -O3 -DNDEBUG
[t8] LDFLAGS
[t8] LIBS P4EST::P4EST SC::SC MPI::MPI_C
[t8] Using delta_t = 0.032000
[t8] Committed cmesh with 4 global trees.
[t8] Start adadpt 0.002101 -0.002101
[t8] Into t8_forest_adapt from 16384 total elements
[t8] Done t8_forest_adapt with 4558320 total elements
[t8] End adadpt 0.199893 0.197792
[t8] Enter forest partition.
[t8] Start partition 0.199979 0.199979
[t8] End partition 0.208866 0.008887
[t8] Done forest partition.
[t8] Into t8_forest_balance with 4558320 global elements.
[t8] Computed maximum occurring level: 8
[t8] Into t8_forest_ghost with 569790 local elements.
[t8] Start ghost at 0.226103 -0.226103
[t8] End ghost at 0.292601 0.066498
[t8] Done t8_forest_ghost with 569790 local elements and 15485 ghost elements.
[t8] Profiling: 1
[t8] Start adadpt 0.292621 -0.292621
[t8] Into t8_forest_adapt from 4558320 total elements
[t8] Done t8_forest_adapt with 4573216 total elements
[t8] End adadpt 0.354983 0.062362
[t8] Enter forest partition.
[t8] Start partition 0.355051 0.355051
[t8] End partition 0.361946 0.006895
[t8] Done forest partition.
[t8] Into t8_forest_ghost with 571652 local elements.
[t8] Start ghost at 0.364684 -0.364684
[t8] End ghost at 0.416118 0.051433
[t8] Done t8_forest_ghost with 571652 local elements and 15544 ghost elements.
[t8] Profiling: 1
[t8] Start adadpt 0.416141 -0.416141
[t8] Into t8_forest_adapt from 4573216 total elements
[t8] Done t8_forest_adapt with 4601440 total elements
[t8] End adadpt 0.498088 0.081947
[t8] Enter forest partition.
[t8] Start partition 0.498677 0.498676
[t8] End partition 0.502745 0.004069
[t8] Done forest partition.
[t8] Into t8_forest_ghost with 575180 local elements.
[t8] Start ghost at 0.505659 -0.505659
[t8] End ghost at 0.557156 0.051497
[t8] Done t8_forest_ghost with 575180 local elements and 15873 ghost elements.
[t8] Profiling: 1
[t8] Start adadpt 0.557175 -0.557175
[t8] Into t8_forest_adapt from 4601440 total elements
[t8] Done t8_forest_adapt with 4648256 total elements
[t8] End adadpt 0.675904 0.118729
[t8] Enter forest partition.
[t8] Start partition 0.676512 0.676512
[t8] End partition 0.686310 0.009798
[t8] Done forest partition.
[t8] Into t8_forest_ghost with 581032 local elements.
[t8] Start ghost at 0.688951 -0.688951
[t8] End ghost at 0.744801 0.055850
[t8] Done t8_forest_ghost with 581032 local elements and 16422 ghost elements.
[t8] Profiling: 1
[t8] Start adadpt 0.744819 -0.744819
[t8] Into t8_forest_adapt from 4648256 total elements
[t8] Done t8_forest_adapt with 4649376 total elements
[t8] End adadpt 0.863747 0.118927
[t8] Enter forest partition.
[t8] Start partition 0.864383 0.864383
[t8] End partition 0.869593 0.005210
[t8] Done forest partition.
[t8] Into t8_forest_ghost with 581172 local elements.
[t8] Start ghost at 0.871266 -0.871266
[t8] End ghost at 0.923508 0.052242
[t8] Done t8_forest_ghost with 581172 local elements and 16446 ghost elements.
[t8] Profiling: 1
[t8] Start adadpt 0.923524 -0.923524
[t8] Into t8_forest_adapt from 4649376 total elements
[t8] Done t8_forest_adapt with 4649376 total elements
[t8] End adadpt 1.051267 0.127743
[t8] Done t8_forest_balance with 4649376 global elements.
[t8] Statistics for forest balance: Adapt time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0646732 (0.003 = 4.64%)
[t8] Minimum attained at rank 3: 0.062135
[t8] Maximum attained at rank 1: 0.068688
[t8] Statistics for forest balance: Adapt time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0821344 (0.000428 = 0.521%)
[t8] Minimum attained at rank 3: 0.081806
[t8] Maximum attained at rank 6: 0.083232
[t8] Statistics for forest balance: Adapt time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.118677 (0.00042 = 0.354%)
[t8] Minimum attained at rank 2: 0.118279
[t8] Maximum attained at rank 6: 0.119743
[t8] Statistics for forest balance: Adapt time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.119126 (0.00111 = 0.931%)
[t8] Minimum attained at rank 2: 0.118523
[t8] Maximum attained at rank 6: 0.122047
[t8] Statistics for forest balance: Adapt time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.127708 (0.000561 = 0.439%)
[t8] Minimum attained at rank 2: 0.127265
[t8] Maximum attained at rank 6: 0.129152
[t8] Statistics for forest balance: Total adapt time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.512319 (0.00478 = 0.933%)
[t8] Minimum attained at rank 2: 0.508419
[t8] Maximum attained at rank 6: 0.522614
[t8] Summary = [ 0.0646732 0.0821344 0.118677 0.119126 0.127708 0.512319 ];
[t8] Maximum = [ 0.068688 0.083232 0.119743 0.122047 0.129152 0.522614 ];
[t8] Statistics for forest balance: Ghost time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0512449 (0.000435 = 0.848%)
[t8] Minimum attained at rank 6: 0.050128
[t8] Maximum attained at rank 3: 0.051583
[t8] Statistics for forest balance: Ghost time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0515504 (0.000421 = 0.816%)
[t8] Minimum attained at rank 6: 0.050481
[t8] Maximum attained at rank 2: 0.051946
[t8] Statistics for forest balance: Ghost time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0556621 (0.00111 = 2%)
[t8] Minimum attained at rank 6: 0.052731
[t8] Maximum attained at rank 2: 0.056257
[t8] Statistics for forest balance: Ghost time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0522777 (0.000561 = 1.07%)
[t8] Minimum attained at rank 6: 0.050832
[t8] Maximum attained at rank 2: 0.05272
[t8] Statistics for forest balance: Total ghost time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.210735 (0.00251 = 1.19%)
[t8] Minimum attained at rank 6: 0.204172
[t8] Maximum attained at rank 2: 0.212393
[t8] Summary = [ 0.0512449 0.0515504 0.0556621 0.0522777 0.210735 ];
[t8] Maximum = [ 0.051583 0.051946 0.056257 0.05272 0.212393 ];
[t8] Statistics for forest balance: Partition time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.00686963 (0.00138 = 20%)
[t8] Minimum attained at rank 3: 0.005146
[t8] Maximum attained at rank 1: 0.009283
[t8] Statistics for forest balance: Partition time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.00521925 (0.000772 = 14.8%)
[t8] Minimum attained at rank 0: 0.004069
[t8] Maximum attained at rank 2: 0.00628
[t8] Statistics for forest balance: Partition time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.00729638 (0.00281 = 38.5%)
[t8] Minimum attained at rank 7: 0.004249
[t8] Maximum attained at rank 2: 0.011448
[t8] Statistics for forest balance: Partition time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.00519713 (0.000547 = 10.5%)
[t8] Minimum attained at rank 5: 0.004715
[t8] Maximum attained at rank 2: 0.006559
[t8] Statistics for forest balance: Total partition time
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0245824 (0.00407 = 16.6%)
[t8] Minimum attained at rank 5: 0.0211
[t8] Maximum attained at rank 1: 0.031333
[t8] Summary = [ 0.00686963 0.00521925 0.00729638 0.00519713 0.0245824 ];
[t8] Maximum = [ 0.009283 0.00628 0.011448 0.006559 0.031333 ];
[t8] Into t8_forest_ghost with 581172 local elements.
[t8] Start ghost at 1.055855 -1.055855
[t8] End ghost at 1.108176 0.052320
[t8] Done t8_forest_ghost with 581172 local elements and 16446 ghost elements.
[t8] Printing stats for cmesh.
[t8] Statistics for cmesh: Number of trees sent.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for cmesh: Number of ghosts sent.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for cmesh: Number of trees received.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for cmesh: Number of ghosts received.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for cmesh: Number of bytes sent.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for cmesh: Number of processes sent to.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for cmesh: First tree is shared.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): -8 (0 = 0%)
[t8] Minimum attained at rank 0: -8
[t8] Maximum attained at rank 0: -8
[t8] Statistics for cmesh: Partition runtime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for cmesh: Commit runtime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 3.1e-05 (2.47e-05 = 79.7%)
[t8] Minimum attained at rank 6: 4e-06
[t8] Maximum attained at rank 0: 8e-05
[t8] Statistics for cmesh: Number of geometry evaluations.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 651125 (7.44e+05 = 114%)
[t8] Minimum attained at rank 0: 2304
[t8] Maximum attained at rank 4: 1.81606e+06
[t8] Statistics for cmesh: Accumulated geometry evaluation runtime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0189933 (0.0216 = 114%)
[t8] Minimum attained at rank 6: 8e-05
[t8] Maximum attained at rank 5: 0.052864
[t8] Summary = [ 0 0 0 0 0 0 -8 0 3.1e-05 651125 0.0189933 ];
[t8] Maximum = [ 0 0 0 0 0 0 -8 0 8e-05 1.81606e+06 0.052864 ];
[t8] Printing stats for forest.
[t8] Statistics for forest: Number of elements sent.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 441347 (4.62e+05 = 105%)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 5: 1.13548e+06
[t8] Statistics for forest: Number of elements received.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 441347 (2.23e+05 = 50.6%)
[t8] Minimum attained at rank 4: 0
[t8] Maximum attained at rank 1: 569790
[t8] Statistics for forest: Number of bytes sent.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 1.05924e+07 (1.11e+07 = 105%)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 5: 2.72517e+07
[t8] Statistics for forest: Number of processes sent to.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 1.375 (0.992 = 72.2%)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 4: 3
[t8] Statistics for forest: Number of ghost elements sent.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 33532.8 (6.59e+03 = 19.6%)
[t8] Minimum attained at rank 0: 18758
[t8] Maximum attained at rank 6: 39696
[t8] Statistics for forest: Number of ghost elements received.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 33532.8 (7.09e+03 = 21.1%)
[t8] Minimum attained at rank 0: 16446
[t8] Maximum attained at rank 4: 39682
[t8] Statistics for forest: Number of processes we sent ghosts to/received from.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for forest: Adapt runtime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for forest: Partition runtime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0146836 (0.00525 = 35.8%)
[t8] Minimum attained at rank 1: 0.008687
[t8] Maximum attained at rank 3: 0.023082
[t8] Statistics for forest: Commit runtime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.855875 (1.79e-06 = 0.000209%)
[t8] Minimum attained at rank 3: 0.855872
[t8] Maximum attained at rank 4: 0.855877
[t8] Statistics for forest: Ghost runtime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.0523034 (0.00076 = 1.45%)
[t8] Minimum attained at rank 6: 0.050317
[t8] Maximum attained at rank 2: 0.052796
[t8] Statistics for forest: Ghost waittime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for forest: Balance runtime.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0.831332 (0.000132 = 0.0159%)
[t8] Minimum attained at rank 1: 0.831124
[t8] Maximum attained at rank 2: 0.831623
[t8] Statistics for forest: Balance rounds.
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 5 (0 = 0%)
[t8] Minimum attained at rank 0: 5
[t8] Maximum attained at rank 0: 5
[t8] Summary = [ 441347 441347 1.05924e+07 1.375 33532.8 33532.8 0 0 0.0146836 0.855875 0.0523034 0 0.831332 5 ];
[t8] Maximum = [ 1.13548e+06 569790 2.72517e+07 3 39696 39682 0 0 0.023082 0.855877 0.052796 0 0.831623 5 ];
[t8] Statistics for new
[t8] Global number of values: 16
[t8] Mean value (std. dev.): 0.00203125 (3.23e-05 = 1.59%)
[t8] Minimum attained at rank 0: 0.002
[t8] Maximum attained at rank 5: 0.002106
[t8] Statistics for adapt
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for ghost
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for partition
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for balance
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 0 (0)
[t8] Minimum attained at rank 0: 0
[t8] Maximum attained at rank 0: 0
[t8] Statistics for total
[t8] Global number of values: 8
[t8] Mean value (std. dev.): 1.10902 (0.000273 = 0.0246%)
[t8] Minimum attained at rank 1: 1.10871
[t8] Maximum attained at rank 5: 1.10947
[t8] Summary = [ 0.00203125 0 0 0 0 1.10902 ];
[t8] Maximum = [ 0.002106 0 0 0 0 1.10947 ];
Am I right in seeing that the stats are not being accumulated and reported? Or are they being reported in the forest balance: <Adapt, Ghost, Partition> time sections? What needs to change in order to accumulate/report the timing for these stages?
Just confirming that PR https://github.com/DLR-AMR/t8code/pull/1242 fixed the compilation issue and I have reran the same commands as above with main with the same results. I want to make sure I get the right timing results for this benchmark to correctly represent and reproduce the scalability of t8code.
Small Scale Results
Device: 2021 MacBook Pro (CPU: M1 Pro, RAM: 32GB) Command:
mpirun -np $n ./t8_time_forest_partition -g -b -C 0.8 -x -0.4 -X -0.3 -l 4 -r 5 -L -o -T 0.025
| Section | 1 | 2 | 4 | 8 |
|---|---|---|---|---|
| Adapt | 12.4669 | 7.41635 | 4.03027 | 3.37253 |
| Ghost | 0 | 2.26506 | 1.77662 | 1.53293 |
| Partition | 1.40002 | 0.753114 | 0.370952 | 0.283714 |
| Total | 18.8554 | 14.8982 | 10.3473 | 7.95105 |
Comments
Running this on my laptop to confirm input parameters prior to submitting larger batch job on cluster.
Note that I got the timing results reported above from the [t8] Statistics for forest balance: Total <adapt, ghost, partition> time sections as indicated in the comments above. If this is not the proper time to be reporting, please let me know ASAP.
Should I be running into memory issues with this benchmark? This runs just fine up to 32 MPI ranks but beyond that the benchmark terminates with the following:
>>> cat output-n64.txt
[libsc] This is libsc 2.8.5.999
[p4est] This is p4est 2.8.6.999
[t8] This is t8 2.0.0
[t8] CXX /opt/cray/pe/mpich/8.1.28/ofi/nvidia/23.3/bin/mpicxx
[t8] CXXFLAGS -fast -O3 -DNDEBUG
[t8] CC /opt/cray/pe/mpich/8.1.28/ofi/nvidia/23.3/bin/mpicc
[t8] CFLAGS -fast -O3 -DNDEBUG
[t8] LDFLAGS
[t8] LIBS P4EST::P4EST SC::SC MPI::MPI_C
[t8] Using delta_t = 0.032000
[t8] Committed cmesh with 4 global trees.
[t8] Start adadpt 1282.875929 -1282.875929
[t8] Into t8_forest_adapt from 16384 total elements
[t8] Done t8_forest_adapt with 36264176 total elements
[t8] End adadpt 1283.927255 1.051326
[t8] Enter forest partition.
[t8] Start partition 1283.932301 1283.932301
[libsc 8] Caught signal SEGV
[libsc 12] Caught signal SEGV
[libsc 20] Caught signal SEGV
[libsc 9] Caught signal SEGV
[libsc 22] Caught signal SEGV
[libsc 13] Caught signal SEGV
[libsc 23] Caught signal SEGV
[libsc 40] Caught signal SEGV
[libsc 31] Abort: Returned NULL from malloc
[libsc 31] Abort: /home/dchipman1/packages/t8code/sc/src/sc.c:398
[libsc 50] Caught signal SEGV
[libsc 43] Caught signal SEGV
[libsc 52] Caught signal SEGV
[libsc 49] Caught signal SEGV
[libsc 54] Caught signal SEGV
[libsc 53] Caught signal SEGV
[libsc 55] Caught signal SEGV
[libsc 8] Abort: Obtained 11 stack frames
[libsc 8] Stack 0: libsc.so.2.0.0(+0xda15) [0x154d9a2a1a15]
[libsc 8] Stack 1: libsc.so.2.0.0(+0xb69c) [0x154d9a29f69c]
[libsc 8] Stack 2: libc.so.6(+0x4adc0) [0x154d95a53dc0]
[libsc 8] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x154d9a48ec83]
[libsc 8] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x154d9a491863]
[libsc 8] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x154d9a491129]
[libsc 8] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x154d9a49e5fb]
[libsc 8] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x154d9a49e624]
[libsc 8] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 8] Stack 9: libc.so.6(__libc_start_main+0xef) [0x154d95a3e24d]
[libsc 8] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 51] Caught signal SEGV
[libsc 12] Abort: Obtained 11 stack frames
[libsc 12] Stack 0: libsc.so.2.0.0(+0xda15) [0x1511a2570a15]
[libsc 12] Stack 1: libsc.so.2.0.0(+0xb69c) [0x1511a256e69c]
[libsc 12] Stack 2: libc.so.6(+0x4adc0) [0x15119de53dc0]
[libsc 12] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x1511a275dc83]
[libsc 12] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x1511a2760863]
[libsc 12] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x1511a2760129]
[libsc 12] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x1511a276d5fb]
[libsc 12] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x1511a276d624]
[libsc 12] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 12] Stack 9: libc.so.6(__libc_start_main+0xef) [0x15119de3e24d]
[libsc 12] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 20] Abort: Obtained 11 stack frames
[libsc 20] Stack 0: libsc.so.2.0.0(+0xda15) [0x1550de5f3a15]
[libsc 20] Stack 1: libsc.so.2.0.0(+0xb69c) [0x1550de5f169c]
[libsc 20] Stack 2: libc.so.6(+0x4adc0) [0x1550d9e53dc0]
[libsc 20] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x1550de7e0c83]
[libsc 20] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x1550de7e3863]
[libsc 20] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x1550de7e3129]
[libsc 20] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x1550de7f05fb]
[libsc 20] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x1550de7f0624]
[libsc 20] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 20] Stack 9: libc.so.6(__libc_start_main+0xef) [0x1550d9e3e24d]
[libsc 20] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 22] Abort: Obtained 11 stack frames
[libsc 22] Stack 0: libsc.so.2.0.0(+0xda15) [0x14e968acfa15]
[libsc 22] Stack 1: libsc.so.2.0.0(+0xb69c) [0x14e968acd69c]
[libsc 22] Stack 2: libc.so.6(+0x4adc0) [0x14e964253dc0]
[libsc 22] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x14e968cbcc83]
[libsc 22] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x14e968cbf863]
[libsc 22] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x14e968cbf129]
[libsc 22] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x14e968ccc5fb]
[libsc 22] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x14e968ccc624]
[libsc 22] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 22] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14e96423e24d]
[libsc 22] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 40] Abort: Obtained 11 stack frames
[libsc 40] Stack 0: libsc.so.2.0.0(+0xda15) [0x147214034a15]
[libsc 40] Stack 1: libsc.so.2.0.0(+0xb69c) [0x14721403269c]
[libsc 40] Stack 2: libc.so.6(+0x4adc0) [0x14720f853dc0]
[libsc 40] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x147214221c83]
[libsc 40] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x147214224863]
[libsc 40] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x147214224129]
[libsc 40] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x1472142315fb]
[libsc 40] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x147214231624]
[libsc 40] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 40] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14720f83e24d]
[libsc 40] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 50] Abort: Obtained 11 stack frames
[libsc 50] Stack 0: libsc.so.2.0.0(+0xda15) [0x14d7bdcd4a15]
[libsc 50] Stack 1: libsc.so.2.0.0(+0xb69c) [0x14d7bdcd269c]
[libsc 50] Stack 2: libc.so.6(+0x4adc0) [0x14d7b9453dc0]
[libsc 50] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x14d7bdec1c83]
[libsc 50] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x14d7bdec4863]
[libsc 50] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x14d7bdec4129]
[libsc 50] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x14d7bded15fb]
[libsc 50] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x14d7bded1624]
[libsc 50] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 50] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14d7b943e24d]
[libsc 50] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 52] Abort: Obtained 11 stack frames
[libsc 52] Stack 0: libsc.so.2.0.0(+0xda15) [0x14e11f0e7a15]
[libsc 52] Stack 1: libsc.so.2.0.0(+0xb69c) [0x14e11f0e569c]
[libsc 52] Stack 2: libc.so.6(+0x4adc0) [0x14e11a853dc0]
[libsc 52] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x14e11f2d4c83]
[libsc 52] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x14e11f2d7863]
[libsc 52] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x14e11f2d7129]
[libsc 52] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x14e11f2e45fb]
[libsc 52] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x14e11f2e4624]
[libsc 52] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 52] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14e11a83e24d]
[libsc 52] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 54] Abort: Obtained 11 stack frames
[libsc 54] Stack 0: libsc.so.2.0.0(+0xda15) [0x15090ce9da15]
[libsc 54] Stack 1: libsc.so.2.0.0(+0xb69c) [0x15090ce9b69c]
[libsc 54] Stack 2: libc.so.6(+0x4adc0) [0x150908653dc0]
[libsc 54] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x15090d08ac83]
[libsc 54] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x15090d08d863]
[libsc 54] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x15090d08d129]
[libsc 54] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x15090d09a5fb]
[libsc 54] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x15090d09a624]
[libsc 54] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 54] Stack 9: libc.so.6(__libc_start_main+0xef) [0x15090863e24d]
[libsc 54] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 13] Abort: Obtained 11 stack frames
[libsc 13] Stack 0: libsc.so.2.0.0(+0xda15) [0x14699e0dea15]
[libsc 13] Stack 1: libsc.so.2.0.0(+0xb69c) [0x14699e0dc69c]
[libsc 13] Stack 2: libc.so.6(+0x4adc0) [0x146999853dc0]
[libsc 13] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x14699e2cbc83]
[libsc 13] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x14699e2ce863]
[libsc 13] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x14699e2ce129]
[libsc 13] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x14699e2db5fb]
[libsc 13] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x14699e2db624]
[libsc 13] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 13] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14699983e24d]
[libsc 13] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 49] Abort: Obtained 11 stack frames
[libsc 49] Stack 0: libsc.so.2.0.0(+0xda15) [0x1456674e7a15]
[libsc 49] Stack 1: libsc.so.2.0.0(+0xb69c) [0x1456674e569c]
[libsc 49] Stack 2: libc.so.6(+0x4adc0) [0x145662c53dc0]
[libsc 49] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x1456676d4c83]
[libsc 49] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x1456676d7863]
[libsc 49] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x1456676d7129]
[libsc 49] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x1456676e45fb]
[libsc 49] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x1456676e4624]
[libsc 49] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 49] Stack 9: libc.so.6(__libc_start_main+0xef) [0x145662c3e24d]
[libsc 49] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 51] Abort: Obtained 11 stack frames
[libsc 51] Stack 0: libsc.so.2.0.0(+0xda15) [0x1477798b8a15]
[libsc 51] Stack 1: libsc.so.2.0.0(+0xb69c) [0x1477798b669c]
[libsc 51] Stack 2: libc.so.6(+0x4adc0) [0x147775053dc0]
[libsc 51] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x147779aa5c83]
[libsc 51] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x147779aa8863]
[libsc 51] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x147779aa8129]
[libsc 51] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x147779ab55fb]
[libsc 51] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x147779ab5624]
[libsc 51] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 51] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14777503e24d]
[libsc 51] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 9] Abort: Obtained 11 stack frames
[libsc 9] Stack 0: libsc.so.2.0.0(+0xda15) [0x14fdae2e9a15]
[libsc 9] Stack 1: libsc.so.2.0.0(+0xb69c) [0x14fdae2e769c]
[libsc 9] Stack 2: libc.so.6(+0x4adc0) [0x14fda9a53dc0]
[libsc 9] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x14fdae4d6c83]
[libsc 9] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x14fdae4d9863]
[libsc 9] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x14fdae4d9129]
[libsc 9] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x14fdae4e65fb]
[libsc 9] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x14fdae4e6624]
[libsc 9] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 9] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14fda9a3e24d]
[libsc 9] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 23] Abort: Obtained 11 stack frames
[libsc 23] Stack 0: libsc.so.2.0.0(+0xda15) [0x1468a180ba15]
[libsc 23] Stack 1: libsc.so.2.0.0(+0xb69c) [0x1468a180969c]
[libsc 23] Stack 2: libc.so.6(+0x4adc0) [0x14689d053dc0]
[libsc 23] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x1468a19f8c83]
[libsc 23] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x1468a19fb863]
[libsc 23] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x1468a19fb129]
[libsc 23] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x1468a1a085fb]
[libsc 23] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x1468a1a08624]
[libsc 23] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 23] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14689d03e24d]
[libsc 23] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 31] Abort: Obtained 9 stack frames
[libsc 31] Stack 0: libsc.so.2.0.0(+0xda15) [0x1455fdf41a15]
[libsc 31] Stack 1: libsc.so.2.0.0(sc_calloc+0x180) [0x1455fdf3fbc0]
[libsc 31] Stack 2: libt8.so.2.0.0-982-gce8365c89(+0x625c0) [0x1455fe1315c0]
[libsc 31] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x1455fe131129]
[libsc 31] Stack 4: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x1455fe13e5fb]
[libsc 31] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x1455fe13e624]
[libsc 31] Stack 6: t8_time_forest_partition() [0x402a86]
[libsc 31] Stack 7: libc.so.6(__libc_start_main+0xef) [0x1455f983e24d]
[libsc 31] Stack 8: t8_time_forest_partition() [0x401dba]
[libsc 43] Abort: Obtained 11 stack frames
[libsc 43] Stack 0: libsc.so.2.0.0(+0xda15) [0x1495bf257a15]
[libsc 43] Stack 1: libsc.so.2.0.0(+0xb69c) [0x1495bf25569c]
[libsc 43] Stack 2: libc.so.6(+0x4adc0) [0x1495baa53dc0]
[libsc 43] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x1495bf444c83]
[libsc 43] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x1495bf447863]
[libsc 43] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x1495bf447129]
[libsc 43] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x1495bf4545fb]
[libsc 43] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x1495bf454624]
[libsc 43] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 43] Stack 9: libc.so.6(__libc_start_main+0xef) [0x1495baa3e24d]
[libsc 43] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 53] Abort: Obtained 11 stack frames
[libsc 53] Stack 0: libsc.so.2.0.0(+0xda15) [0x14f93c60ea15]
[libsc 53] Stack 1: libsc.so.2.0.0(+0xb69c) [0x14f93c60c69c]
[libsc 53] Stack 2: libc.so.6(+0x4adc0) [0x14f937e53dc0]
[libsc 53] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x14f93c7fbc83]
[libsc 53] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x14f93c7fe863]
[libsc 53] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x14f93c7fe129]
[libsc 53] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x14f93c80b5fb]
[libsc 53] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x14f93c80b624]
[libsc 53] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 53] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14f937e3e24d]
[libsc 53] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 55] Abort: Obtained 11 stack frames
[libsc 55] Stack 0: libsc.so.2.0.0(+0xda15) [0x149470c74a15]
[libsc 55] Stack 1: libsc.so.2.0.0(+0xb69c) [0x149470c7269c]
[libsc 55] Stack 2: libc.so.6(+0x4adc0) [0x14946c453dc0]
[libsc 55] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x149470e61c83]
[libsc 55] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x149470e64863]
[libsc 55] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x149470e64129]
[libsc 55] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x149470e715fb]
[libsc 55] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x149470e71624]
[libsc 55] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 55] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14946c43e24d]
[libsc 55] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 18] Caught signal SEGV
[libsc 18] Abort: Obtained 11 stack frames
[libsc 18] Stack 0: libsc.so.2.0.0(+0xda15) [0x148e60e83a15]
[libsc 18] Stack 1: libsc.so.2.0.0(+0xb69c) [0x148e60e8169c]
[libsc 18] Stack 2: libc.so.6(+0x4adc0) [0x148e5c653dc0]
[libsc 18] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x148e61070c83]
[libsc 18] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x148e61073863]
[libsc 18] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x148e61073129]
[libsc 18] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x148e610805fb]
[libsc 18] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x148e61080624]
[libsc 18] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 18] Stack 9: libc.so.6(__libc_start_main+0xef) [0x148e5c63e24d]
[libsc 18] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 19] Caught signal SEGV
[libsc 19] Abort: Obtained 11 stack frames
[libsc 19] Stack 0: libsc.so.2.0.0(+0xda15) [0x146707ad3a15]
[libsc 19] Stack 1: libsc.so.2.0.0(+0xb69c) [0x146707ad169c]
[libsc 19] Stack 2: libc.so.6(+0x4adc0) [0x146703253dc0]
[libsc 19] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x146707cc0c83]
[libsc 19] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x146707cc3863]
[libsc 19] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x146707cc3129]
[libsc 19] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x146707cd05fb]
[libsc 19] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x146707cd0624]
[libsc 19] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 19] Stack 9: libc.so.6(__libc_start_main+0xef) [0x14670323e24d]
[libsc 19] Stack 10: t8_time_forest_partition() [0x401dba]
[libsc 39] Caught signal SEGV
[libsc 39] Abort: Obtained 11 stack frames
[libsc 39] Stack 0: libsc.so.2.0.0(+0xda15) [0x149ef404ea15]
[libsc 39] Stack 1: libsc.so.2.0.0(+0xb69c) [0x149ef404c69c]
[libsc 39] Stack 2: libc.so.6(+0x4adc0) [0x149eef853dc0]
[libsc 39] Stack 3: libt8.so.2.0.0-982-gce8365c89(t8_element_array_get_size+0x3) [0x149ef423bc83]
[libsc 39] Stack 4: libt8.so.2.0.0-982-gce8365c89(+0x62863) [0x149ef423e863]
[libsc 39] Stack 5: libt8.so.2.0.0-982-gce8365c89(t8_forest_partition+0x369) [0x149ef423e129]
[libsc 39] Stack 6: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0x9fb) [0x149ef424b5fb]
[libsc 39] Stack 7: libt8.so.2.0.0-982-gce8365c89(t8_forest_commit+0xa24) [0x149ef424b624]
[libsc 39] Stack 8: t8_time_forest_partition() [0x402a86]
[libsc 39] Stack 9: libc.so.6(__libc_start_main+0xef) [0x149eef83e24d]
[libsc 39] Stack 10: t8_time_forest_partition() [0x401dba]
Hey @DamynChipman , can you provide us the input parameters for this run, to reproduce it? In general you shouldn't have any problems running this example on a cluster, we have tested it using much more than 32 procs. But this has been a couple of years ago, so we will double check again, especially with your input parameters.
One part of your output might indicate that you just ran out of memory:
[libsc 31] Abort: Returned NULL from malloc
[libsc 31] Abort: /home/dchipman1/packages/t8code/sc/src/sc.c:398
Maybe your parameter-combination produced a mesh that was to large to handle for the machine?
@DamynChipman I tried to run this benchmark example on my machine with 64 ranks. It runs just fine.
mpirun -n 64 ~/install/t8code/main/bin/t8_time_forest_partition -g -b -C 0.8 -x -0.4 -X -0.3 -l 4 -r 5 -L -o -T 0.025
I use MPICH version 4.0.2 and gcc version 12.1.0 on a Linux Kernel 6.8.0-45-generic and 22.04.1-Ubuntu.
Large (ish) Scale Results
Machine: Falcon (Dual Intel Xeon 18 core nodes) Command:
mpirun -n $N ./t8_time_forest_partition -g -b -C 0.8 -x -0.2 -X 0.2 -l 4 -r 4 -L -o -T 0.025 >> output_n$N.txt
Results:
| 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Adapt [sec] | 22.198 | 15.3656 | 4.88433 | 3.51135 | 2.74351 | 1.75711 | 1.25502 | 0.832045 | 0.963431 | 2.83504 |
| Ghost [sec] | 0 | 4.92692 | 2.57878 | 2.4864 | 1.56907 | 0.939954 | 0.627435 | 0.385915 | 0.289718 | 0.207101 |
| Partition [sec] | 3.50545 | 1.78216 | 0.728757 | 0.433819 | 0.198721 | 0.152427 | 0.0674145 | 0.0380946 | 0.0197075 | 0.00966559 |
| Total [sec] | 45.7854 | 35.9366 | 12.8784 | 10.5972 | 6.90435 | 4.21385 | 3.01467 | 2.16236 | 3.01856 | 4.97634 |
| Total Elements | 34723392 | 34723392 | 34723392 | 34723392 | 34723392 | 34723392 | 34723392 | 34723392 | 34723392 | 34723392 |
Comments
Turns out the issue was just the machine I was running on (it's a rather picky cluster that is currently being worked on as a decommissioned national lab computer). I was able to run up to 512 MPI ranks with runs beyond that failing due to the machine configuration. For the purposes of reproducing this benchmark, I am satisfied and impressed with the speed and memory footprint for over 30M elements!
Thank you for your guidance on reproducing this result!
Thank you for the update and the praise ;) We are happy that we could resolve your questions.