BMTrain
BMTrain copied to clipboard
BurstAttention and Ulyless all2all support for long sequence training.
BurstAttention and Ulyless all2all support for long sequence training.
Issue Reference
N/A
Description
- Add BurstAttention as distributed ring_flash attention implementation.
- Add all2all communication ops for all2all attention (Same with DeepSpeed Ulyless).
- Modify training example code to support sequence parallel(SP)
- Add SP communicator and communication stream
- All test related to above change
Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.
Checklist
- [x] I have read the CONTRIBUTING document.
- [x] My code follows the code style of this project.
- [x] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [x] I have added tests to cover my changes.
- [x] All new and existing tests passed.
Additional Information
Any additional information, configuration, or data that might be necessary for the review.