core Add nompi stub

Add nompi stub

Add a stub implementation of MPI in pcu/pcu_pnompi.c. Remove direct calls to MPI functions.

This adds a SCOREC_NO_MPI build option for users running on single-process, single-machine setups that do not want to compile with MPI. Message passing is simulated in memory.

Apr 26 '24 22:04 bobpaw

@cwsmith, build is failing for this PR. I think we should add PCU_Init and PCU_Finalize facades that wrap the MPI calls and do a PCU_Comm_Init() and PCU_Comm_Free() to remove the direct dependencies. There are a lot of executables that would need to be changed and this would become a large PR. Alternatively, we could define a replacement MPI_Init and MPI_Finalize in pcu_pnompi and link to those. Or, we could do a hybrid approach and encourage people to use PCU_Init in the future.

Apr 26 '24 22:04 bobpaw

@flagdanger and I are working on a very large update of PCU that should make this all easier by getting rid of the singleton. Pull request for that coming within the next week.

Apr 29 '24 03:04 jacobmerson

@bobpaw sorting out a few things to make the pull request ready to merge, but #388 the draft pull request for replacing the singleton in PCU.

Although we replaced the interface used in the test cases, this pull request is designed to work with no modification to existing codes by using a global PCU state when the old style interface is used with PCU_Comm_Init or it can use local state.

Apr 29 '24 14:04 jacobmerson

@jacobmerson, we will plan to use your PR for the future of core, but the sponsor wants this functionality soon, so we will use this branch until yours is finished.

Apr 30 '24 16:04 bobpaw

@bobpaw Does PCU_Comm_Init() check if MPI was initialized?

Apr 30 '24 16:04 cwsmith

@bobpaw Does PCU_Comm_Init() check if MPI was initialized?

No, at the moment it only checks if PCU has been initialized using a global state variable. We could check with MPI_Initialized.

Apr 30 '24 17:04 bobpaw

@cwsmith Note Aiden is only doing this when SCOREC_NO_MPI is defined to be true, so there is no actual MPI (e.g., see pcu/pcu_pnompi.c). For now, he is not looking into PCU_init, that will be done after the other PR #388.

Apr 30 '24 17:04 onkarsahni

@bobpaw OK. I'm wondering if it will make sense to make the MPI check in that call, and if it fails call MPI_Init. But... handling MPI_Finalize could be a bit more tricky; other libraries may still be running and using MPI beyond the lifetime of PCU. I'd prefer symmetry in the interface so your original suggestion may be best.

Apr 30 '24 17:04 cwsmith

@cwsmith, after seeing the new PCU object introduced by #433, we would still like to take this build-time stubbing approach. This PR results in compilation of the entire repository without MPI dependence. Since I merged develop a few times, should we squash this PR?

Feb 06 '25 15:02 bobpaw

Before I make my example project I have a feeling I will need to add a config file, called something like mpi_rpc.h (@cwsmith I would appreciate input on the name) which will be generated to make SCOREC_NO_MPI persistent. Then I can bundle the PCU.h #ifdef SCOREC_NO_MPI logic and the content in pcu_pnompi_types.h into one file.

Feb 06 '25 22:02 bobpaw

We should probably just make a PUMI_config.h file (like this) and the first entry in it can be the SCOREC_NO_MPI flag. Over time, we can move away from using add_definition(...) to the config file approach.

Feb 07 '25 18:02 cwsmith

I compared the CTest suite resultant meshes in pumi-meshes and build/test/ and they are identical except for build/test/mis_test. However, MIS uses random numbers, so I am figuring they are otherwise identical.

@onkarsahni said:

When srand is used, one should provide option to use a fixed seed (for debugging, testing, regression, etc.) and a different seed (like based on time function), see https://github.com/SCOREC/core/blob/develop/apf/apfMIS.cc#L11.

Feb 19 '25 22:02 bobpaw

The other more minor concern is that we should get rid of the need for #ifdefs in the user code by providing our own init and finalize routines. That will substantially clean up the executables.

I was considering adding a specialized PCU::PCU(int *argc, char ***argv) which would do MPI_Init and save a flag to run MPI_Finalize later. That specialization would allow users to avoid introducing a scope just for the PCU object because MPI_Finalize will be guaranteed to be called after any pcu_finalize, etc.

Outside of something like that, what is so bad about having #ifdefs? When we discussed, we thought it would help to make it clear what is going on to people looking at tests as examples.

Feb 20 '25 16:02 bobpaw

Are the #ifdef SCOREC_NO_MPI conditionals only used in the examples/drivers?

Feb 20 '25 16:02 cwsmith

Are the #ifdef SCOREC_NO_MPI conditionals only used in the examples/drivers?

@cwsmith, the places with #ifdef SCOREC_NO_MPI are: tests (including "utils"), some exes in phasta/, pcu/pcu_defines.h, and pcu/PCU.h (for pcu::Time and the default constructor).

Feb 20 '25 22:02 bobpaw

The thing about ifdefs is it can make the code much more unreadable and the user needs to know under the hood whether they should have MPI or not. If we wrap those ifdefs in our own initialize function then we take care of that subtlety for them.

I was considering adding a specialized PCU::PCU(int *argc, char ***argv) which would do MPI_Init and save a flag to run MPI_Finalize later. That specialization would allow users to avoid introducing a scope just for the PCU object because MPI_Finalize will be guaranteed to be called after any pcu_finalize, etc.

I don't like this solution. The reason we have the PCU object is that we may have many of them lying around with different communicators under the hood, but ultimately still using the same MPI library. I have use cases where this is needed. That constructor means the PCU object may or may not own the MPI initialization. What happens when the owning PCU goes out of scope? All other PCUs that are still alive would break as they would be doing MPI operations after MPI_finalize.

So, there needs to be some library/MPI initialization at a higher level than the PCU object.

Feb 20 '25 22:02 jacobmerson

What happens when the owning PCU goes out of scope? All other PCUs that are still alive would break as they would be doing MPI operations after MPI_finalize.

Shouldn't the world-initializing PCU only go out of scope at the end of the main function? Shouldn't all other objects have been destructed by then?

Feb 21 '25 01:02 bobpaw

I am not touching apfCGNS code yet because pcgnslib.h has #include "mpi.h" and therefore CGNS is incompatible with SCOREC_NO_MPI. But I will need to add CMake logic related to the incompatibility and update apfCGNS to not use GetMPIComm.

Feb 21 '25 18:02 bobpaw

We determined that instead of PCU::OwnsComm, we can remove pcu_mpi_t::original_comm entirely and also remove PCU::GetMPIComm and PCU::SwitchMPIComm since those functions only existed because we had to shuffle the MPI_Comm around in the singleton PCU. With the new pcu::PCU C++ object, we do not need such functions.

Feb 21 '25 18:02 bobpaw

SCOREC_NO_MPI is also likely exclusive with ENABLE_ZOLTAN since zoltan.h has #include <mpi.h>.

Feb 24 '25 23:02 bobpaw

SCOREC_NO_MPI is also likely exclusive with ENABLE_ZOLTAN since zoltan.h has #include <mpi.h>.

Hmmm. Zoltan can be used locally on each process (see the MPI_COMM = SELF) to do what we called 'local part splitting' but that use case then relies on migrating elements to other processes via ... MPI. So, yes, ENABLE_ZOLTAN requires MPI. Good catch.

Feb 25 '25 01:02 cwsmith

For when I update apfZoltanCallbacks.cc,

https://github.com/sandialabs/Zoltan/blob/f6361719dd66cac62db8dbed120704e436a5ee81/src/zz/zz_struct.c#L114-L139

Zoltan duplicates the comm, so I should free my reference after Zoltan create.

Feb 25 '25 17:02 bobpaw

@cwsmith, I'm running into a subtle bug that I'm hoping you can help with. The test case pumi3d-1p is failing intermittently (1/4 or 1/5 of the time) in this new branch. I believe the culprit is a change made in pumi/pumi_mesh.cc, but I cannot figure out how my changes caused this issue. The test is only activated given ENABLE_ZOLTAN, and since this code uses apf::createZoltanSplitter, it is doing the MPI_COMM_SELF splitting like you described. I can additionally share logs of the failing test.

New code:

https://github.com/SCOREC/core/blob/006947468405b1e31a3a22407d10f38fa09c8236/pumi/pumi_mesh.cc#L213-L241

Old code:

https://github.com/SCOREC/core/blob/a900c4761cdc33592be0d5b0c1521cf3996a043d/pumi/pumi_mesh.cc#L234-L260

Feb 25 '25 22:02 bobpaw

@cwsmith I have made the changes you and Jacob requested. Can you review when you get the chance?

Feb 28 '25 15:02 bobpaw

@bobpaw Here is a comment that discusses running the delta wing regression test I mentioned earlier: https://github.com/SCOREC/core/pull/433#issuecomment-2516069759

Feb 28 '25 16:02 cwsmith

@bobpaw Here is a comment that discusses running the delta wing regression test I mentioned earlier: #433 (comment)

Do I need to worry about about timing? Or should I just compare the quality file?

Feb 28 '25 17:02 bobpaw

We should check timing and quality.

Feb 28 '25 17:02 cwsmith

We should check timing and quality.

@cwsmith, adapt completed in 5761.497560 seconds. How do I measure the resulting length_proc.txt and quality_proc.txt files? I see there's a histogram utility but it wants min, max, nbins.

Mar 02 '25 19:03 bobpaw

@bobpaw Hmmm. I'm guessing that was a serial run? If so, I only have prior results for a 32 processes (see https://github.com/SCOREC/core/pull/433#issuecomment-2516069759). Regarding quality, if the final entity counts are similar we should be OK. We could also render the adapted mesh to sanity check.

Mar 02 '25 19:03 cwsmith

No, this was a 32proc run on wilbur.

Mar 02 '25 19:03 bobpaw