torch-ccl
torch-ccl copied to clipboard
Trouble using torch-ccl with the mlx provider
We've had success using torch-ccl with resnet and other AI workloads to test with libfabric over psm3 but when we try to use libmlx-fi.so, torch-ccl does not seem to see it even when the provider has been copied into the provider directory.
Is this a known limitation of torch-ccl? Is there a make file we need to modify?
TIA.
@mwheinz torch-ccl doesn't work with mlx provider. I think the issue is oneCCL needs thread multiple capability to use multiple workers, and MLX provider doesn't support it so it fails at the init call itself.