gdrcopy icon indicating copy to clipboard operation
gdrcopy copied to clipboard

Does GDRcopy support the HPE/Cray "SlingShot" backbone?

Open cponder opened this issue 3 years ago • 3 comments

I'm looking into a performance issue with an app. If you could tell me up-front whether you support this kind of cluster, it would save some troubleshooting time.

cponder avatar Aug 09 '22 22:08 cponder

Carl, the libfabric plugin and the NCCL plugin have both been able to use GDRCopy on a SlingShot based machine.

See Jim's patch: https://github.com/aws/aws-ofi-nccl/pull/146

AddyLaddy avatar Aug 09 '22 23:08 AddyLaddy

Do you know if UCX can use it? I'll check with the UCX people...

cponder avatar Aug 09 '22 23:08 cponder

Do you mean "can UCX use GDRCopy?" ? I believe that UCX will use GDRCopy if the compile-time options / runtime environments are satisfied. The code is here: https://github.com/openucx/ucx/tree/master/src/uct/cuda/gdr_copy.

pakmarkthub avatar Aug 10 '22 00:08 pakmarkthub