non blocking team destroy
Smth for the WG to discuss. Currently we defined the UCC team creation to be non-blocking: team_create_post + test. The corresponding ucc_team_destroy is blocking call.
However, in some cases ucc_team_destroy will actually involve communication among the ranks of the team that is being destroyed. Examples: TL UCP team might have the UCP EPs connected that must be disconnected during destroy - this involves ucp up_close protocol which is non-local. Another example would be mcast group destruction which requires synchronizing flush over participating ranks.
I faced the issue when i was adding team_destroy into the gtest. There we simulate multiple ranks from the single process and obviously it is impossible then to destroy the team with blocking API (the very first rank in gtest will hang). i've implemented non-blocking team destruction internally (in CL/TL and Base interface) and use it in gtest. Currently the ucc api is kept the same: blocking team destruction (implemented as while (UCC_OK != ucc_team_destroy_nb(team))).
Question: do we want to define ucc_team_destroy as non-blocking call in UCC API? Probably there is not much use case for it, but this could make it more consistent.
Good point. I agree that it would nice to make team_destroy non-blocking and consistent with team_post. IIRC, we discussed this and we anticipated several race cases when we make team_destroy a non-blocking call.
decided for team_destroy to be non-blocking with constrain:
- team_destroy & team_create should not overlap
- 1 team is being destroyed at a time