prrte icon indicating copy to clipboard operation
prrte copied to clipboard

Clarify roles/responsibilities of PRRTE vs PMIx layers

Open rhc54 opened this issue 4 years ago • 0 comments

PRRTE is looping across PMIx multiple times, creating conflict as described here

FWIW: here is what should be happening. PRRTE should call "setup_application" with a directive that we only harvest envars, and specifying the programming model ("ompi" or whatever). This is done by the user-facing launch tool (e.g., prun). It then takes the results (which will contain an array of PMIX_SET_ENVAR infos) and creates an env for the application.

We then process that using the command line, altering the env array as required. This is the env that is included in the pmix_app_t. This is why I have concerns about pushing things into the environment - we don't want OMPI params polluting the environment of the PRRTE tools, just as we don't want PRRTE params polluting the environment of OMPI apps. There is no need for OMPI params to be in the environment - they only need to be in the env array of the pmix_app_t.

When PRRTE goes to launch, it again calls "setup_application", but this time it passes in the job map and directives to assign fabric values like endpts. This is done by the DVM master, and it specifically does not ask to harvest envars as the DVM master is not guaranteed local to the user. This information is added to the launch message sent to the backend daemons.

We need to review the current logic to ensure this gets done correctly when the launch tool and DVM master are one and the same (i.e., prterun). It sounds like that isn't happening and there is an extra call being made that screws things up. Also need to check the split between PMIx and PRRTE to ensure we don't have overlap in their roles as that would likewise cause problems.

rhc54 avatar Apr 24 '22 00:04 rhc54