ucc
ucc copied to clipboard
EC/ROCM: add host execution capability
What
Introduce the ability to use host based reduction and copy operations
Why?
This avoids the cost of a kernel launch., which can be beneficial for short messages and when trying to overlap communication and computation occurring on the GPU.