ucc
ucc copied to clipboard
Use collective active_set to send data to self
Currently we use active_set field to "emulate" p2p communication:
ucc_coll_args_t coll = {0};
coll.mask = UCC_COLL_ARGS_FIELD_FLAGS | UCC_COLL_ARGS_FIELD_ACTIVE_SET | UCC_COLL_ARGS_FIELD_FLAGS;
coll.flags = UCC_COLL_ARGS_FLAG_COUNT_64BIT;
coll.coll_type = UCC_COLL_TYPE_BCAST;
coll.root = root_rank;
coll.tag = tag;
coll.src.info.buffer = ptr;
coll.src.info.count = size;
coll.src.info.datatype = UCC_DT_UINT8;
coll.src.info.mem_type = UCC_MEMORY_TYPE_CUDA;
coll.active_set.size = 2;
coll.active_set.start = my_rank;
coll.active_set.stride = my_rank - peer_rank;
however I don't see a way to set up coll properly to "send" to myself for this code to work: https://github.com/openucx/ucc/blob/master/src/utils/ucc_coll_utils.h#L270-L271
Currently i have workaround that uses cudaMemcpyAsync instead of UCC when i try to do something like this, but for composability it would be nice for UCC to handle this case:
for (msg: messages)
if (myrank == msg.src)
ucc.send(msg.dst, ...)
if (myrank == msg.dst)
ucc.recv(msg.src, ...)
wait_all_ucc_requests()
From the API perspecitve: active_set.size = 1; start = my_rank; stride = 1; should probably work. But it might not be implemented currectly in TL/UCP currently.