core creating large matched meshes from connectivity and coordinates

A large structured hex (eventually mixed) mesh is needed and there are issues getting a mesh generator to easily produce this.

The matchedReader branch contains a driver named matchedNodeElmReader that calls the apf::Construct api to create an mds/pumi mesh from a files containing connectivity and coordinates.

There are no new/special compilation flags needed to build this branch of core and the driver.

Run the driver with the following command:

mpirun -np <numProcs> ./matchedNodeElmReader box_small.cnn box_small.crd NULL outModel.dmg outMesh.smb

NULL is the place holder argument for the file containing the matching info.

The box_small files are attached in a tarball: box_small.tar.gz

The format for the .cnn, .crd, and .match files is:

$ head -5 box_small.cnn
1 1 2 53 52 2602 2603 2654 2653
2 2 3 54 53 2603 2604 2655 2654
3 3 4 55 54 2604 2605 2656 2655
4 4 5 56 55 2605 2606 2657 2656
5 5 6 57 56 2606 2607 2658 2657
….
el# global_node1:8

$ head -5 box_small.crd
1 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.020000
3 0.000000 0.000000 0.040000
4 0.000000 0.000000 0.060000
5 0.000000 0.000000 0.080000
nd# x y z

$ head -5 box_small.match
1 51
2 -1
3 -1
4 -1
5 -1
node# matching node if not -1

@rickybalin

Jan 04 '19 21:01 cwsmith

Pasting Ken's email below

Cameron and I met today to discuss the plan for this effort. He is close to having something to share.

Remove counter from the first column of all files and assume that line number is the global entity number (with obvious adjustment for multiple topology connectivity):

file 1 coordinates (forgot what we named it) Coordinates numVerts 3 x1 x2 x3
(repeat for remainder (numVerts -1) times)

File 2 connectivity (again forgot what we called it)

numElmThisTop nElmNodes gn1 gn2 …. nElmNodes (repeat for remainder (numElmThisTop -1) times) Repeat header and rectangular array for each topology. Could put into different files initially if easier and then cat the files to create File 2.

File 3 Matching numVerts 1 -1 if unmatched Node number matched with

File 4 Classification numVerts 1 0=fully interior of the volume 1-6 =classified on face (not edge or vertex) 11-22 = classified on model edge (not end points which are model vertices) 31-38 = classified on a model vertex.

I suppose on this last one we should agree on a master numbering. Cameron, I know SCOREC used to have canonical numbering for hex elements and if you can find that we should stick with it (it probably also exists in Simmetrix documentation too)?

File 5) Boundary conditions We should probably recycle our existing flat text smd or spj file format with the only change being that model tag is replaced by the classification tag described in File 4.

Riccardo mentioned in Slack that he has a 1B element case ready for testing and called it his largest so I suspect he has some in between which would be good.

Current plan:

Cameron is going to push a version of his code that ignores matching to github as soon as he confirms it works on the already provided 175k element on 1, 2, and 4 target mds parts

Riccardo and I will retest 1)

Riccardo and I will begin to test progressively larger cases

Cameron will move on to add features in the following order: a) matching capability b) classification ingestion c) Boundary condition ingestion and connection a), b) and c) such that Chef has what it would have gotten from smd and mesh previously.

Riccardo and I test features as they get added and obviously run PHASTA to verify that all the information is coming through

We are leaving Initial conditions out but Riccardo can correct me if I am wrong, we have already in the codes that he is using a way to overwrite good, spatially varying initial and boundary condition values. The latter may or may require correct BC codes but I think the above will do that.

Let me know if I have forgotten anything or gotten any of if differently than we discussed today.

Best regards,

Ken

Jan 04 '19 21:01 cwsmith

The parallel reader now works.

Jan 05 '19 03:01 cwsmith

Am I correct in assuming that this is a separate repo from SCOREC/core, rather it is a personal repo. Can you provide some instructions for how best to check this out as a branch "under" our standard repo. I guess I am not git-savy enough to do that since for PHASTA we keep everything under either the public or the next repo which are ultimately both "linked" on github.

Jan 05 '19 16:01 KennethEJansen

cd /path/to/existing/core/repo
git remote add cws [email protected]:cwsmith/core.git
git fetch cws
git checkout matchedReader

On my system this looks like:

~/develop/core (develop)$ git remote -v
origin	[email protected]:SCOREC/core.git (fetch)
origin	[email protected]:SCOREC/core.git (push)
~/develop/core (develop)$ git remote add cws [email protected]:cwsmith/core.git
~/develop/core (develop)$ git fetch cws
remote: Enumerating objects: 75, done.
remote: Counting objects: 100% (75/75), done.
remote: Compressing objects: 100% (29/29), done.
remote: Total 75 (delta 53), reused 68 (delta 46), pack-reused 0
Unpacking objects: 100% (75/75), done.
From github.com:cwsmith/core
 * [new branch]        apf_sim_config          -> cws/apf_sim_config
 * [new branch]        chef                    -> cws/chef
 * [new branch]        cws/control_print_stmts -> cws/cws/control_print_stmts
 * [new branch]        dcVtx                   -> cws/dcVtx
 * [new branch]        develop                 -> cws/develop
 * [new branch]        fusion_dev              -> cws/fusion_dev
 * [new branch]        generateAdvanced        -> cws/generateAdvanced
 * [new branch]        liipbmod                -> cws/liipbmod
 * [new branch]        master                  -> cws/master
 * [new branch]        matchedReader           -> cws/matchedReader
 * [new branch]        mis-work                -> cws/mis-work
 * [new branch]        phastaChef_mesh_measure -> cws/phastaChef_mesh_measure
 * [new branch]        sim_getDgCopies         -> cws/sim_getDgCopies
 * [new branch]        ugridhacks              -> cws/ugridhacks
~/develop/core (develop)$ git checkout matchedReader
Branch 'matchedReader' set up to track remote branch 'matchedReader' from 'cws'.
Switched to a new branch 'matchedReader'
~/develop/core (matchedReader)$

Jan 05 '19 17:01 cwsmith

Thanks. I have successfully built the code on the viz nodes and repeated the 175k test. I further tested the following larger cases

Parts	Mesh size	Time to Create (s)	Memory Peak (GB)	Time to Write smb (s)
4	8M	65	4x1.7	26
4	125M	999	4x27	350
8	""	1012	8x13.5

(note it is called 150M but it is actually 125M according to the code). As expected this is linear in time and memory for 4 parts so all is looking good at this point. The mesh creation does not seem to benefit from more cores/parts but maybe that is expected?

Thanks for adding the render function and output. I did verify the 175k and 8M cases in paraview

I will have to move to Cooley to test the 1B element case since it is projected to need > 800GB.

Jan 06 '19 01:01 KennethEJansen

kjansen@cc014: /projects/TRAssembly_2/kjansen/Models/BoeingBump/matchedReaderTest/1B $ mpirun -np 32 --hostfile $COBALT_NODEFILE /projects/TRAssembly_2/SCOREC-core/build-matchedReader/test/matchedNodeElmReader box_WMLES_1B.cnn box_WMLES_1B.crd NULL mod mesh/ numElms 1000000000 numVerts 1003003001 isMatched 0 seconds to create mesh 3589.092 mesh verified in 158.253273 seconds mesh mesh/ written in 22.195060 seconds writeVtuFile into buffers: 129.821842 seconds writeVtuFile buffers to disk: 1.203647 seconds vtk files rendered written in 136.573262 seconds

Looking in top on Cooley on the above case I had 8 nodes so 4 processes per node. One process seems far ahead of the other 3. It is running at 100% while the other three are at 60% with a Status of D. Presumably they are struggling to read from a single file. At this point all are in the compute phase Tasks: 364 total, 5 running, 359 sleeping, 0 stopped, 0 zombie %Cpu(s): 28.2 us, 5.5 sy, 0.0 ni, 66.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 39597302+total, 30272150+free, 90494048 used, 2757468 buff/cache KiB Swap: 5242876 total, 4956576 free, 286300 used. 30331353+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
49976 kjansen 20 0 23.0g 18.8g 10296 R 100.0 5.0 37:28.15 matchedNodeElmR
49977 kjansen 20 0 19.1g 16.7g 10300 R 100.0 4.4 36:23.76 matchedNodeElmR
49974 kjansen 20 0 31.2g 24.1g 10316 R 96.2 6.4 54:27.47 matchedNodeElmR
49975 kjansen 20 0 23.0g 18.9g 10300 R 96.2 5.0 34:57.68 matchedNodeElmR
7344 kjansen 20 0 177836 3324 2180 R 3.8 0.0 0:18.54 top

but note the 18 minute gap. Earlier the ones at 34 minutes were hovering at 1.6G but now they are done reading I guess and processing elements.

Here are the results from 64 processes

kjansen@cc014: /projects/TRAssembly_2/kjansen/Models/BoeingBump/matchedReaderTest/1B $ mpirun -np 64 --hostfile $COBALT_NODEFILE /projects/TRAssembly_2/SCOREC-core/build-matchedReader/test/matchedNodeElmReader box_WMLES_1B.cnn box_WMLES_1B.crd NULL mod mesh/ numElms 1000000000 numVerts 1003003001 isMatched 0 seconds to create mesh 3433.222 mesh verified in 87.809694 seconds mesh mesh/ written in 13.419787 seconds writeVtuFile into buffers: 72.740630 seconds writeVtuFile buffers to disk: 0.592511 seconds vtk files rendered written in 76.710356 seconds

Jan 06 '19 03:01 KennethEJansen

Note all of the above is with NULL passed for matching. Please let me know when you think the code is ready to test matching. Also, it looks like you have not made the jump to the new file formats yet (e.g., your code is still not expecting a header and is reading id as the first column on crd and cnn at least).

Jan 06 '19 06:01 KennethEJansen

I think we are facing a tradeoff between IO bottlenecks at large process/per node and the scalability of the mds element creation. To test the extreme of this I tried one process per node but as as you can see below, this fails on an INT_MAX limit:

jansen@cc014: /projects/TRAssembly_2/kjansen/Models/BoeingBump/matchedReaderTest/1B $ mpirun -np 8 --hostfile $COBALT_NODEFILE /projects/TRAssembly_2/SCOREC-core/build-matchedReader/test/matchedNodeElmReader box_WMLES_1B.cnn box_WMLES_1B.crd NULL mod mesh/ numElms 1000000000 numVerts 1003003001 isMatched 0 ERROR PCU message size exceeds INT_MAX... exiting ERROR PCU message size exceeds INT_MAX... exiting [cc015:mpi_rank_2][error_sighandler] Caught error: Aborted (signal 6) [cc009:mpi_rank_5][error_sighandler] Caught error: Aborted (signal 6) ERROR PCU message size exceeds INT_MAX... exiting [cc003:mpi_rank_6][error_sighandler] Caught error: Aborted (signal 6) ERROR PCU message size exceeds INT_MAX... exiting [cc010:mpi_rank_3][error_sighandler] Caught error: Aborted (signal 6) ERROR PCU message size exceeds INT_MAX... exiting [cc004:mpi_rank_4][error_sighandler] Caught error: Aborted (signal 6) ERROR PCU message size exceeds INT_MAX... exiting [cc005:mpi_rank_1][error_sighandler] Caught error: Aborted (signal 6) ERROR PCU message size exceeds INT_MAX... exiting [cc014:mpi_rank_0][error_sighandler] Caught error: Aborted (signal 6)

Jan 06 '19 06:01 KennethEJansen

Thanks for testing. Glad to see the tool worked for 1B elements. I've added that result to the table.

Parts	Mesh size	Time to Create (s)	Memory Peak (GB)	Time to Write smb (s)
4	8M	65	4x1.7	26
4	125M	999	4x27	350
8	125M	1012	8x13.5
32	1B	3589	32x31?	22
64	1B	3433	?	13
8	1B	Fail
16x1	1B	2807	?	39
16x2	1B	3472	?	18.7
16x4	1B	2906	?	14

I think we'll have to move to MPI_IO and binary files to get the read times down. This would append another bullet on item 4 from here. Matlab supports binary file writing: https://www.mathworks.com/help/matlab/ref/fwrite.html

Matching is not supported yet. I'll post an update here when it is ready for testing.

Jan 06 '19 12:01 cwsmith

Since this approach appears promising, I am going to take some time here to plan the remaining steps to putting this into a production run. I will put tentative assignments via initials but of course we can redistribute as needed: 1)(DONE) Preliminary testing for meshes up to 1B . This is nearly complete as noted above by KEJ.
2) (DONE) Adding matching capability (CWS). see #211 3) Switch over to the header plus removal of ID column (CWS). 4) Adding classification capability (CWS). 5) Creation of "real" meshes (current ones are isotropic boxes) with spatially varying 2D boundary layers and vertical filling to top boundary (RB). Retest should be easy since, other than bugs, it is just coordinate changes. After coords and connectivity are checked "mesh generation" code needs to add capability to write the model classification file described above. Should also prepare BC/IC flat text file. 6) (DONE CWS confirms flat text will work) Adding the load of flat text BC and IC description a la spj or smd (CWS or other if needed since I think there is a prospect that what we have will just work as long as 4) is done in a way that is compatible). 7) With 6) complete we should be able to start testing in PHASTA (RB). 8) Push this up to our larger target meshes (RB). 9) Adding Multi-topology (CWS) 10) Test 8) and 9) in PHASTA 11) Adding MPI_IO and binary (CWS). This is a low priority for now. Of course it would be nice but I think we can get to > 10B without it. 1B element case on 16 nodes with one process takes about 1/2 an hour to read the mesh. KEJ has modified the code to measure this directly and will get data on that at 2 and 4 processes per node as time allows but since we were able to get 1 B through on 8 nodes, I suspect we can get 10B through on 80 (or probably much less) which we have in Cooley. I further suspect that it should be reasonably straightforward to do the preliminary element partitioning in the same code that creates the coordinates and connectivity such that the elements can be written to already parted files. It might even be possible too write the coordinates (and other vertex-based data files to separate files. This would likely speed up the data ingestion. That said, I am keeping this as the lowest priority because we are not likely to have to do this enough times to justify a huge effort to make this stage as fast as it could be and I don't want us to spend time (both sides) on that fine tuning until we get the first 10 tasks verified.

Jan 06 '19 14:01 KennethEJansen

What is the notation in the parts column? Is it x ? No. It is nodes x proccesses_per_node

The list of todo items looks good to me.

Is 10B the upper limit for mesh size? Yes, near term.

Jan 07 '19 00:01 cwsmith

Yes 10B should be the upper limit.

Riccardo Balin Doctoral Candidate Smead Aerospace Engineering Sciences University of Colorado, Boulder [email protected]

On Mon, Jan 7, 2019 at 1:58 AM Cameron Smith [email protected] wrote:

What is the notation in the parts column? Is it x ?

The list of todo's looks good to me.

Is 10B the upper limit for mesh size?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCOREC/core/issues/205#issuecomment-451792516, or mute the thread https://github.com/notifications/unsubscribe-auth/AKCQcFtiEPrZx7SoyNQFHLcjP8puf-rwks5vApvOgaJpZM4Ztyry .

Jan 07 '19 08:01 rickybalin

To update on number 5 on the list of steps above, I am able to generate meshes of the desired geometry and with the desired features (it is missing a spatially varying BL height but that is easy to add).

Here is a picture of the mesh I created. Note this is only a 35k element mesh, but should be good enough to test that this workflow is successful with a simple RANS simulation.

mesh

Jan 08 '19 18:01 rickybalin

This looks good. I am going to add to task 5 what you probably already know goes with it -- modifying your code to write the model classification and boundary condition files which are needed by task 6.

Jan 08 '19 18:01 KennethEJansen

I started working on that, but could not find a rule of thumb for the order of numbering of the geometric entities (faces, edges and vertices). I am making up one, but will be able to modify the code with another order easily.

Jan 08 '19 18:01 rickybalin

Unless Cameron says otherwise, I suggest you follow the order of the connectivity array. Model vertices start with 0 at the "origin" which we can define as minx, miny, minz and then there are three other model vertices which cycle in an order that their curl points into the domain. These 4 vertices define the first four edges as well (e0 = v1-v0, e1= v2-v1, etc). These first 4 edges define face 0 (f0). The other 4 model vertices follow the same order but shifted into the third dimension (e.g v4 is a third dimension displacement from v0 and others shift this relationship by 1 so that v7 is extension of v3). The next 4 edges are numbered such that their origin is v{0..3} and their endpoint is v{4...7} (e.g., they are directed into the third dimension (e.g., e4=v4-v1)). The last 4 edges are "parallel" to the first 4 and form the loop for the face opposite face 0 (e.g., e8=v5-v4). The next 4 faces are formed such that their "base" is on the first 4 edges (e.g., f1 has a base with e0 and in general f{1..4} has a base with edge{0..3}. Finally, face 5 closes off the box opposite of f0.

Jan 08 '19 18:01 KennethEJansen

Code for the classification file is also done, and I gave the file a ".class" extension. This file has the mesh node global ID in the first column, and the classification tag in the second column, and no title since it seems like the title is not wanted in the other files.

For the BC and IC, I simply copied the .spj file that had the desired boundary and initial conditions for a .smd model with the same geometry, and modified the face and model tags to match what was done in the classification file. The tag I used for what Simmetrix and SimModeler call the region is 0 because we said to give a classification of 0 to the fully interior nodes, but I can change that if 0 is not an appropriate value.

This should complete the set of files required, which for the hill geometry can be found on the viz nodes under the name "2DGaussHill_test" at: /users/rbalin/NSF_DataDrivenTurbMod/PrelimTests/matchedNodeElmReader/2Dhill/test

Jan 08 '19 21:01 rickybalin

@rickybalin Looking back at the box_small case there appears to be problem with the matching file. Specifically, if vtx 1 is matched with vtx 51, as indicated by line one,

$ head -n 1 box_small.match 
1 51

I'd expect to see that vtx 51 was matched with vtx 1 on line 51. Instead it looks like vtx 51 is not matched.

$ sed '51q;d' box_small.match 
51 -1

Jan 09 '19 22:01 cwsmith

That was intentional. I made that script so that the nodes on the z_min face would be matched to the "master" nodes on the z_max face, but not the other direction.

I will change it so the nodes on both the periodic faces point to the corresponding node on the other face.

The file box_small.match in /users/rbalin/NSF_DataDrivenTurbMod/PrelimTests/matchedNodeElmReader/channel/small has been changed accordingly.

Jan 09 '19 22:01 rickybalin

Cameron, if it makes your matching work easier, It would be easy for us to restrict ourselves to cases where there was only on-part matching.

This is as easy as being sure to choose numVerts/numParts=k * Nz where k is strictly an integer, Nz is the number of points in the matched direction (z in this case), numVerts is the the total number of vertices and numParts is the number of parts we will stream this into.

Since numVerts=Nx * Ny * Nz for tensor product grids, this is equivalent to Nx * Ny=numParts * k

which also says there are k, Nz length lines of nodes on each part.

This is a super easy constraint for us to live with if it makes your life easier. Even if we drop the tensor product grids when we go to multi-topology, it would not be hard to order those vertices such that they always end up on the same part after the streaming.

Jan 09 '19 23:01 KennethEJansen

We should also plan for which stage we will switch the file formats (e.g., add the header with the array shape and ditch the ID column) .

Jan 10 '19 14:01 KennethEJansen

See #211 for progress on matching and other test cases.

Moving @KennethEJansen posts on classification here:

vertex: whatever it is classified on

edge classification: An edge in the mesh will have 2 vertices. From their model classification the following classification pairs are possible and those pairs -> edge_classification v v ->e v e ->e v f -> f v r -> r e e -> e e f -> f e r -> r f f -> f f r -> r r r -> r

face classification: a mesh face will have 3 or 4 mesh vertices with model classification. if any of the vertices are classified on a model region -> region else ->model face It is impossible to have more than one model face in the 3 or 4 vertice's classification

region -> region

Jan 17 '19 20:01 cwsmith

Since, in the above, there is a general rule that classification is always set by the classification of the node on the closure with the highest dimension classification, I think you can encapsulate all the above with entClass=max(nodesOfClosureClass) or loop over the nodes of the entity and assign the min of the node's closure to the entity

Because we have numbered higher dimensional model entities with higher number this works. Initially I thought it would be good to change our model entity numbering to be 10^d+entNumberD where d is the dimension of the entity and entNumberD was counting of entities of dimension D but, since we always went to higher numbers when we started numbering entities what we had already is sufficient with a min instead of a max.

The only thing this does not cover is the v v -> e which we already discussed is not relevant to this code.

Jan 23 '19 00:01 KennethEJansen

Task list for Classification:

Set up model adjacency (what model faces close model region, what model edges close each model face, ...). Do we have to do this? We might have to do this if we are going to use inheritance of boundary conditions.
For each entity in the mesh, set model classification. a) every mesh region classified on a model region b) every mesh face classified on a model face or model region c) ...

Ken and I made some progress in /users/rbalin/matchedReader/core/test/matchedNodeElmReader.cc. We have a version that compiles, but crashes during the call to apf::setCoords(mesh, m.coords, m.localNumVerts, outMap), before any call to classification related code. It does not seem to be something we broke adding our modifications for classification, but I am still in the process of understanding why it happens.

Ken, I had to modify our initial work because of how setModelEntity() is defined, but it should do the same thing.

Jan 30 '19 01:01 rickybalin

Can you confirm that commit https://github.com/cwsmith/core/commit/75db2593d0d495b96ca0b3148c7d98aa01c75295 builds and runs? The code I added for reading classification etc. may be the cause of the setCoords crash you see.

Jan 30 '19 03:01 cwsmith

It builds, but does not run. Stops at the same point in apf::setCoords, line 224.

Jan 30 '19 16:01 rickybalin

OK. Thank you for checking. I should be able to debug this for about an hour or so tomorrow.

Jan 30 '19 17:01 cwsmith

Thank you, please let us know if there is more we can do to help.

Jan 30 '19 17:01 rickybalin

What broke that version was that, when you were testing 2D you added the reading of three lines at the top of the .cnn file 1 1 nsd numel nvtxPerElement

adding these three lines to our cnn file gets https://github.com/cwsmith/core/commit/75db2593d0d495b96ca0b3148c7d98aa01c75295 to run through (have not checked the result).

Jan 30 '19 17:01 KennethEJansen

Ahh yes. Thank you. I added that to deal with 2d meshes and multiple topologies.

Jan 30 '19 17:01 cwsmith