charm icon indicating copy to clipboard operation
charm copied to clipboard

CkArray: Better synchronization for begin/doneInserting calls

Open epmikida opened this issue 4 years ago • 0 comments

Currently, doneInserting is meant to be called after all insert calls are called on a dynamic array. However, the execution of doneInserting assumes that not only have all the inserts been called, but all the associated elements have also been created on their destination PEs. There is currently no synchronization done to ensure that this is actually the case, nor is there any way outside of QD for application users to ensure all elements have actually been created. This can cause bugs in AtSync and reduction managers when the counts are incorrect.

There are two options that need to be explored: full synchronization via CD, and partial synchronization via insertion count. Full synchronization would use a completion detection scheme to detect when all sent insert messages have been received. Partial synchronization would just track the number of insertions done, and do a reduction so that the root knows the full numbers of elements inserted.

Partial sync is less expensive, and should suffice for the reduction manager, because at the root of the reduction if we know the total number of elements, we can wait for stragglers.

Whether or not partial sync is sufficient for AtSync is still unknown, and AtSync may require a full synchronization to be guaranteed correct.

epmikida avatar Jul 09 '21 18:07 epmikida