Nikoli Dryden
Nikoli Dryden
Based on #1357, it seems we're not sufficiently aggressively testing the Python frontend in the integration tests. Can we at least run `--setup-only` on all the scripts in the `applications`...
Provide optimized versions of our custom-implemented NCCL collectives: - [ ] `Alltoall` - [ ] `Gather` - [ ] `Scatter` - [ ] `Allgatherv` - [ ] `Alltoallv` - [...
The progress engine code has gotten crufty and has a lot of various hacks. Clean it up.
Support a compile-time flag to only start the progress engine on demand (i.e., if something is submitted to it). This is a flag so that we only pay this runtime...
Our current testing infrastructure does not actually check results when using `half`, since MPI does not support it.
The default MPI error handler is typically bad, because it kills the application, but doesn't give you a stack trace. This adds a better error handler.
``` $ jsrun --bind packed:8 --nrs 1 --rs_per_host 1 --tasks_per_rs 1 --launch_distribution packed --cpu_per_rs ALL_CPUS --gpu_per_rs ALL_GPUS ./test_ops.exe --backend mpi --op scatter --inplace Aborting after hang in Al size=1 ```...
Our coding style here is a bit of a mess and needs to be unified. Especially variable names.
The NCCL backend's in-place reduce-scatter uses `sendbuf = recvbuf` [here](https://github.com/LLNL/Aluminum/blob/master/src/nccl_impl.hpp#L403). But per NCCL documentation, the in-place reduce-scatter should actually have `recvbuf` be the appropriate offset into the recvbuf (see [here](https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/usage/inplace.html#in-place-operations))....
As of ROCm 4.2, HIP supports `hipStreamWaitValue32`/`64` and `hipStreamWriteValue32`/`64` (analogous to the corresponding `cuStreamWaitValue`/`WriteValue` methods we use). We should support these as well and make them the default implementation on...