rccl icon indicating copy to clipboard operation
rccl copied to clipboard

changing NCCL_LL128_SHMEM_ELEMS_PER_THREAD to 4 to avoid the perf drop

Open mberenjk opened this issue 4 months ago • 0 comments

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: SWDEV-546712

What were the changes?
changing NCCL_LL128_SHMEM_ELEMS_PER_THREAD to 4 to avoid the perf drop.

Why were the changes made?
A performance drop was observed when setting NCCL_LL128_SHMEM_ELEMS_PER_THREAD to 8.

How was the outcome achieved?
The performance improved and met expectations.

Additional Documentation:
What else should the reviewer know?

Approval Checklist

Do not approve until these items are satisfied.

  • [ ] Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

mberenjk avatar Oct 06 '25 22:10 mberenjk