conduit Expand support for completing non-blocking requests

I'd like to have more flexible ways to complete conduit::relay::mpi::Requests from non-blocking communications. Specifically,

support the way applications store Requests and
provide a choice to use MPI_Waitsome and MPI_Testsome to complete the MPI side of the requests.

Conduit "wait" functions requires a C array of Requests. Building such an array requires reallocating memory when we run out of space. However, reallocating memory would invalidate the pointers given to MPI. We use a list<Request> in Axom so we don't invalidate any pointers to add more Requests. If the Conduit "wait" functions allowed Request *requests[] instead of just Request requests[] we could use them. We're currently writing our own "wait" code, but since we're using conduit::relay::mpi::Request, it would make sense to let Conduit complete the request.

The second part is for conduit to support waitsome and testsome to check requests. The first always completes at least 1 outstanding request (if there are any) and the second is non-blocking. We currently do it this way:

void check_send_requests(std::list<conduit::relay::mpi::Request>& isendRequests,
                         bool atLeastOne) const
{
  std::vector<MPI_Request> reqs; // Intermediate request container for MPI.
  for(auto& isr : isendRequests) reqs.push_back(isr.m_request);

  int inCount = static_cast<int>(reqs.size());
  int outCount = 0;
  std::vector<int> indices(reqs.size(), -1);
  if(atLeastOne)
    MPI_Waitsome(inCount,
                 reqs.data(),
                 &outCount,
                 indices.data(),
                 MPI_STATUSES_IGNORE);
  else
    MPI_Testsome(inCount,
                 reqs.data(),
                 &outCount,
                 indices.data(),
                 MPI_STATUSES_IGNORE);
  indices.resize(outCount);
  // Remove the completed requests.  (Since they are sends, nothing else is requires.)
  auto reqIter = isendRequests.begin();
  int prevIdx = 0;
  for(const int idx : indices)
  {
    for(; prevIdx < idx; ++prevIdx) ++reqIter;
    reqIter = isendRequests.erase(reqIter);
    ++prevIdx;
  }
}

Obviously, if conduit support lists of Requests, we wouldn't need to generate the intermediate container to pass to the wait functions. But there's probably no way to do this without templates.

Jul 28 '22 21:07 gunney1

Instead of an all in one method, how about adding wait_some and test_some methods.

Would it be possible to also support this for recv, if we use std::list as the primary interface for passing the data?

Jul 30 '22 02:07 cyrush

Sorry for the late response.

Yes, I think if you support wait_some and test_some AND std::list as the primary interface for passing data, it would be what we need.

Aug 15 '22 23:08 gunney1

@gunney1

Great, I started working on this, a question:

Your method checks and when the requests are fulfilled, and removes fulfilled ones those from the list. (We can detect the send vs recv case and do the proper node cleanup)

For the general case - do we want to know which ones completed?

Do we want:

int wait_some(std::list<Request> &requests)

For rounds of communication.

or something like:

int wait_some(std::list<Request> &requests,
                       std::vector<int> &out_indices)

If we want to take some action on the Nodes between wait_some calls. It seems like this would be necessary. But if we shrink the list, the out_indices won't correspond to the adjusted list.

Aug 16 '22 17:08 cyrush

It may be possible that I want to have the completed requests around for some reason, although I probably don't care about the out_indices. What do you think of moving the completed requests out of the input list and placed into another list to return? Something like

list<request> check_recv_requests(list<request>& inRequests)
{
  list<request> completed;
  list<request>::iterator aCompletedRequest = ...;
  completed.splice(completed.end(), inRequests, aCompletedRequest);
  ...
  return completed;
}

Would that work? Of course, you can also have both lists in the argument and return void.

Aug 18 '22 22:08 gunney1

Yes, that's a great idea! Style wise I would like to pass a ref to the list for the output as well.

Aug 18 '22 22:08 cyrush

@gunney1 Q: Did you look at https://github.com/LLNL/conduit/blob/1255e71b78314979c2484c0d553361ffdebbe849/src/libs/relay/conduit_relay_mpi.hpp#L281

I think it's different comm pattern than you are using -- but it provides a very nice general interface for isends and irecvs.

Feb 17 '23 20:02 cyrush

@cyrush Very sorry for taking so long. After calling execute, how would I determine which non-blocking operation completed?

Mar 02 '23 23:03 gunney1

@cyrush Just getting back to this. I'm still interested in this feature. I'd like to be able to specify whether I'm willing to wait for at least one to finish or if I'm just checking whether any has actually finished. Basically, whether to use MPI_Waitsome or MPI_Testsome. Testsome is used when I'm not willing to wait, but I will take whatever has completed. None if that's the case. Waitsome is used when I'm willing to wait the minimum time to get at least one completion. I don't necessarily want to wait for all to finish, because I want to interleave work and wait. In my implementation, I have parameter bool atLeastOne, that says whether I'm willing to wait for at least one to complete. The options would allow me to maximize the amount of local computation I can do while waiting.

Apr 26 '24 15:04 gunney1