romerojosh
romerojosh
>@romerojosh, would it make sense to eventually drop NCCLHierarchicalAllreduce entirely in favor of NCCLTorusAllreduce? Or does it still make sense to mix NCCL and MPI like that on some clusters?...
Hey @MrAta, thanks for the comment! Adding an optional list argument to `DistributedGradientTape` like `DistributedGradientTape(tape, local_vars=[...])` was an alternative approach instead of the `register_local_variables` I considered, but thought it would...
@miguelrc1 Have you taken a look at https://github.com/NVIDIA/DeepLearningExamples? The RN50 examples for both TensorFlow and MXNet both show use cases of Horovod + DALI. The PyTorch convnets examples also show...
@miguelrc1 If you compare the `get_dali_train_loader` function here: https://github.com/NVIDIA/DeepLearningExamples/blob/aa061052c674ce19db4966854db196c209fa82e0/PyTorch/Classification/ConvNets/image_classification/dataloaders.py#L181 with the `get_pytorch_train_loader` function here: https://github.com/NVIDIA/DeepLearningExamples/blob/aa061052c674ce19db4966854db196c209fa82e0/PyTorch/Classification/ConvNets/image_classification/dataloaders.py#L303 you can see the difference in usage with DALI. In particular, the DALI loader does...
I don’t think this is can be a generally true phenomenon. Most multi-GPU systems (e.g. all DGX servers) use CPUs with multiple NUMA domains with GPUs assigned to a single...
Before we merge this one, can you verify that your tests work using OpenMPI 4? I was running some Horovod tests on a local system after #3649 was merged and...
Ah sorry @kvignesh1420, I should’ve been more precise. Please try using the latest stable OpenMPI 4.x release (4.1.4). 4.0.0 is quite old.
This wasn't the error I saw, however I haven't run the torch tests from this PR with OMPI 4.1.4. I only saw that the TensorFlow tests after the int8/uint8 PR...
Hi @kvignesh1420, please see #3674. I am working on updating the CI to run using OpenMPI 4.1.4 (including fixing up some tests to deal with the OpenMPI behavior with integer...
@EnricoMi, yes this should work.