ucc CORE: Add flags to enable various optimizations

CORE: Add two flags to enable optimizations based on overlap and latency needs

Signed-off-by: Geoffroy Vallee [email protected]

What

The PR adds new flags so we can provide run-time hints about the type of optimizations are required by the users.

Why ?

There is a need to let users specify the type of optimizations that may be beneficial for applications, e.g. when libraries and applications want to benefit from DPU offloading. An example is a library that uses non-blocking collectives and big messages, for which a specific algorithm has showed concrete benefits; and an application's code that uses the blocking version of the collectives with small messages. While the current selection mechanism proved a great level of flexibility, it is not possible to select different algorithms that would result in optimized implementations for the blocking and non-blocking collectives as described in the above example. The addition of the proposed flags would, we think, enable the specification of new selection policies that would match our current requirements but also be flexible enough to cover all the needs in a foreseeable future.

How ?

N/A

Sep 06 '22 19:09 gvallee

We now have an agreement that this approach addresses our needs. Could we move the PR out of the WIP state?

Oct 12 '22 16:10 gvallee

@vspetrov is this waiting on you?

Feb 01 '23 18:02 manjugv

@vspetrov is this waiting on you?

@manjugv to be honest, only now i see the comment from @gvallee from oct 12.. I remember our talk with Geoffroy that Rich had some second oppinion on this approach and it didn't quite work. Now, it looks like this approach is good enough.

@gvallee , sorry for such a horrible flow, mid october were my last days at NV - i must have missed this comment during that c crazy time. Next time (hopefully, it will not repeat again) plz don't wait that long and ping folks (Sergey, Manju) directly.

@manjugv i suggest we discuss this. The proposal is reasonable. Maybe some "wording" can be changed. The idea is that user can give a hit to UCC that, for example, he would like to get best overlap with CPU for a particular collective (even maybe by the cost of larger wall time). This will be used internally for algorithm selection. Because currently we select algs that has best latency/bw, and they are quite often most cpu/gpu intensive.

Feb 02 '23 07:02 vspetrov

@manjugv i suggest we discuss this. The proposal is reasonable. Maybe some "wording" can be changed. The idea is that user can give a hit to UCC that, for example, he would like to get best overlap with CPU for a particular collective (even maybe by the cost of larger wall time). This will be used internally for algorithm selection. Because currently we select algs that has best latency/bw, and they are quite often most cpu/gpu intensive.

Ok. Let's discuss this at Feb 15th meeting.

Feb 03 '23 17:02 manjugv

IMO, we should not mix semantic flags with the hints flags. Hints flags should be a different structure. (As discussed in the WG today) @vspetrov @Sergei-Lebedev What do you think?

Feb 15 '23 19:02 manjugv

IMO, we should not mix semantic flags with the hints flags. Hints flags should be a different structure. (As discussed in the WG today) @vspetrov @Sergei-Lebedev What do you think?

Sorry, i've missed the call. So not sure i understand: what is "semantics flags". The suggested hints do not alter the semantics of the collective execution: it is still regular non-blocking behavior. The only difference is that user says that he would like for UCC to select the algorithm that, e.g., keeps CPU possibly less utilized. If UCC has no such optimizatoin internally, then UCC can ignore it.

So, plz specify what do you mean by "semantic flags"?

Feb 16 '23 06:02 vspetrov

IMO, we should not mix semantic flags with the hints flags. Hints flags should be a different structure. (As discussed in the WG today) @vspetrov @Sergei-Lebedev What do you think?

Sorry, i've missed the call. So not sure i understand: what is "semantics flags". The suggested hints do not alter the semantics of the collective execution: it is still regular non-blocking behavior. The only difference is that user says that he would like for UCC to select the algorithm that, e.g., keeps CPU possibly less utilized. If UCC has no such optimizatoin internally, then UCC can ignore it.

So, plz specify what do you mean by "semantic flags"?

Semantics flags are modifiers that change collective behavior, for example UCC_COLL_ARGS_FLAG_IN_PLACE. While hint flags just help UCC to choose algorithm. I'm not sure if we can smoothly separate such hints flags and semantics flags because we already have UCC_COLL_ARGS_FLAG_CONTIG_SRC_BUFFER and UCC_COLL_ARGS_FLAG_CONTIG_DST_BUFFER which are essentially hints.

Feb 16 '23 12:02 Sergei-Lebedev

Replaced by #752. Closing this.

Mar 29 '23 15:03 manjugv