rccl
rccl copied to clipboard
changing NCCL_LL128_SHMEM_ELEMS_PER_THREAD to 4 to avoid the perf drop
Details
Do not mention proprietary info or link to internal work items in this PR.
Work item: SWDEV-546712
What were the changes?
changing NCCL_LL128_SHMEM_ELEMS_PER_THREAD to 4 to avoid the perf drop.
Why were the changes made?
A performance drop was observed when setting NCCL_LL128_SHMEM_ELEMS_PER_THREAD to 8.
How was the outcome achieved?
The performance improved and met expectations.
Additional Documentation:
What else should the reviewer know?
Approval Checklist
Do not approve until these items are satisfied.
- [ ] Verify the CHANGELOG has been updated, if
- there are any NCCL API version changes,
- any changes impact library users, and/or
- any changes impact any other ROCm library.