Nikoli Dryden issues

Results 17 issues of


                                            Nikoli Dryden

Test Python frontend

Based on #1357, it seems we're not sufficiently aggressively testing the Python frontend in the integration tests. Can we at least run `--setup-only` on all the scripts in the `applications`...

enhancement

Optimize NCCL collectives

Provide optimized versions of our custom-implemented NCCL collectives: - [ ] `Alltoall` - [ ] `Gather` - [ ] `Scatter` - [ ] `Allgatherv` - [ ] `Alltoallv` - [...

enhancement

Progress engine cleanup

The progress engine code has gotten crufty and has a lot of various hacks. Clean it up.

enhancement

Support no progress engine

Support a compile-time flag to only start the progress engine on demand (i.e., if something is submitted to it). This is a flag so that we only pay this runtime...

enhancement

Testing half

Our current testing infrastructure does not actually check results when using `half`, since MPI does not support it.

MPI error handler

The default MPI error handler is typically bad, because it kills the application, but doesn't give you a stack trace. This adds a better error handler.

enhancement

In-place MPI scatter segfaults on one processor

``` $ jsrun --bind packed:8 --nrs 1 --rs_per_host 1 --tasks_per_rs 1 --launch_distribution packed --cpu_per_rs ALL_CPUS --gpu_per_rs ALL_GPUS ./test_ops.exe --backend mpi --op scatter --inplace Aborting after hang in Al size=1 ```...

bug

Fix coding style

Our coding style here is a bit of a mess and needs to be unified. Especially variable names.

In-place NCCL reduce-scatter is not in-place

The NCCL backend's in-place reduce-scatter uses `sendbuf = recvbuf` [here](https://github.com/LLNL/Aluminum/blob/master/src/nccl_impl.hpp#L403). But per NCCL documentation, the in-place reduce-scatter should actually have `recvbuf` be the appropriate offset into the recvbuf (see [here](https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/usage/inplace.html#in-place-operations))....

Support HIP stream memory operations

As of ROCm 4.2, HIP supports `hipStreamWaitValue32`/`64` and `hipStreamWriteValue32`/`64` (analogous to the corresponding `cuStreamWaitValue`/`WriteValue` methods we use). We should support these as well and make them the default implementation on...

enhancement