Expose an out-of-place _reduce_oop from ProcessGroupNCCL
Summary: This change exposes an out-of-place _reduce_oop API from the ProcessGroupNCCL. It allows reducing an input tensor and placing the output in a separate output tensor.
Custom collectives may be implemented by coalescing reduce operations. One such use-case is implementing a vector reduce_scatter (reduce_scatter_v) where inputs have to be reduced and scattered unevenly among participating ranks. Since reduce_scatter provides an out-of-place API, a reduce_scatter_v semantic implemented inside dist.reduce_scatter also needs to support out-of-place, for which an out-of-place reduce is required to be exposed.
Test Plan: Adds test_reduce_oop_ops in caffe2/test/distributed/test_c10d_nccl.py to test the newly exposed _reduce_oop function in ProcessGroupNCCL.
Differential Revision: D38478781
:link: Helpful links
- :test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/82924
- :page_facing_up: Preview Python docs built from this PR
- :page_facing_up: Preview C++ docs built from this PR
- :question:Need help or want to give feedback on the CI? Visit our office hours
:white_check_mark: No Failures (1 Pending)
As of commit 4ace6629b9 (more details on the Dr. CI page):
Expand to see more
:green_heart: :green_heart: Looks good so far! There are no failures yet. :green_heart: :green_heart:
This comment was automatically generated by Dr. CI (expand for details).
Please report bugs/suggestions to the (internal) Dr. CI Users group.
This pull request was exported from Phabricator. Differential Revision: D38478781
Maybe also worth writing a bit in the PR description about the context --
You need out-of-place reduce from the backend because you want to compose a reduce_scatter_v pattern at the Python front end using coalesced reduces. Today, the dist.reduce_scatter API supports out-of-place, so, if the reduce_scatter_v pattern is implemented under the dist.reduce_scatter API, you would need out-of-place support as well.
Also good to comment the above in code.
This pull request was exported from Phabricator. Differential Revision: D38478781
This pull request was exported from Phabricator. Differential Revision: D38478781
This pull request was exported from Phabricator. Differential Revision: D38478781
This pull request was exported from Phabricator. Differential Revision: D38478781
This pull request was exported from Phabricator. Differential Revision: D38478781
@pytorchbot merge
@pytorchbot successfully started a merge job. Check the current status here. The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki. Please reach out to the PyTorch DevX Team with feedback or questions!
Hey @aashaka. You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'. For changes that are 'topic: not user facing' there is no need for a release notes label.