Michael Heinz
Michael Heinz
Hey, guys. I have a user who is used to handling process placement by simply listing hosts multiple times in the hostfile so that, for example, if he wanted 2...
Hey, guys. A question came up about how the OMPI OFI MTL and BTL handle ensuring that CUDA/HMEM buffers are completely in sync at the end of a data transfer....
As a part of discussing #7699, it was pointed out that there are several error paths in the OFI MTL that call exit() and several in the Portals MTL that...
We've had success using torch-ccl with resnet and other AI workloads to test with libfabric over psm3 but when we try to use libmlx-fi.so, torch-ccl does not seem to see...