conduit icon indicating copy to clipboard operation
conduit copied to clipboard

Add `relay::mpi::isend_using_schema`

Open kennyweiss opened this issue 4 years ago • 3 comments

I'm developing an mpi application where ranks need to send conduit nodes to each other in a point-to-point manner. E.g. imagine a ring where each rank sends a node to its left neighbor and receives from its right neighbor.

This was causing the application to deadlock with using send_using_schema and recv_using_schema. I was able to resolve by adapting the send_using_schema to an isend_using_schema (i.e. using MPI_Isend instead of MPI_Send).

Should I add this to conduit::relay::mpi?

kennyweiss avatar Apr 15 '22 02:04 kennyweiss

Yes, knowing how you approached this would be great!

It might be helpful to review:

https://github.com/LLNL/conduit/pull/433

For isend/irecv - the a prior constraints are hard to tackle for the general case.

For the sync case, every pair has to post their send/recv in the same order, or else deadlock. For your case - isend, irecv is the right approach for sure.

General solutions for isend/irecv I have seen (outside of conduit) use fixed size buffers, and then they manage chunking generic sized payloads into these buffers. Then they use a continuous wait/polling pattern until everything arrives.

It would be great to get a general strategy into conduit -- even if it ends up being a bit more complex than just transactions of isend, irecv +iwait.

This issue is also related: https://github.com/LLNL/conduit/issues/170

(we concluded we don't want any-to-any, we want the general isend/irecv solution or strategy to share)

cyrush avatar Apr 15 '22 15:04 cyrush

Thanks for the pointers @cyrush

I'm currently using a non-blocking send (MPI_Isend) with a blocking receive (MPI_Recv).

I think I still need to tweak my solution a bit since some runs are getting invalid conduit nodes on the receive side (see below).


Here's what I currently have: https://github.com/LLNL/axom/blob/cf11f5210d9751e785be4bd05e1b0638897cbeb4/src/axom/quest/DistributedClosestPoint.hpp#L78-L145

The only difference to send_using_schema is here: https://github.com/LLNL/axom/blob/cf11f5210d9751e785be4bd05e1b0638897cbeb4/src/axom/quest/DistributedClosestPoint.hpp#L121-L130

And here's how I'm using it: https://github.com/LLNL/axom/blob/cf11f5210d9751e785be4bd05e1b0638897cbeb4/src/axom/quest/DistributedClosestPoint.hpp#L326-L337


I think this has to change to make the MPI_Request a parameter to isend_using_schema which is freed after the recv_using_schema, e.g. by sending a [blocking] "acknowledge" message from the receiver back to the sender.

I agree with your comment on #170 that these calls need to be synchronized and a tutorial on how to use them would be really helpful.

kennyweiss avatar Apr 15 '22 17:04 kennyweiss

I updated my solution based on send_using_schema and isend along with its associated conduit::relay::mpi::Request struct to ensure the temporary nodes/schemas survive the send/receive loop.

Here's the update:

  • Combined isend/recv: https://github.com/LLNL/axom/blob/7d5e7a99644a93dabc166c494897fc61f6df44ef/src/axom/quest/DistributedClosestPoint.hpp#L158-L175
  • Updated isend_using_schema with bugfixes adapted from conduit::relay::mpi::isend https://github.com/LLNL/axom/blob/7d5e7a99644a93dabc166c494897fc61f6df44ef/src/axom/quest/DistributedClosestPoint.hpp#L82-L156
  • Usage: https://github.com/LLNL/axom/blob/7d5e7a99644a93dabc166c494897fc61f6df44ef/src/axom/quest/DistributedClosestPoint.hpp#L358-L365

At this point, I'm reasonably confident that my code is working, but I'm no longer sure if my solution is general enough to push to conduit ...

kennyweiss avatar Apr 15 '22 20:04 kennyweiss