MayDomine
MayDomine
### 场景 厨房面积受限,往往做一两道菜就会导致屋子里很乱,面对厨房的整洁问题,是否能提供形式化的收纳建议? 似乎经验老道的厨师很擅长边做菜边收拾,如何训练这种技能?
Here is the thing , I want to compile the code on a platform installed with CUDA 11.8 and call the binary on another machine installed with CUDA 11.3. Becasue...
## 1F1B Pipeline schedule ### Description We have implemented 1f1b pipeline schedule based on our pointTopoint ops. ### Type of Change - [ ] Bug fix (non-breaking change which fixes...
### Is there an existing issue for this? - [X] I have searched the existing issues ### Description of the Bug When I try to run the finetune script of...
## Documentation update ### Description update 1.0.0 version README.md and 1.0.0's and 0.2.3's update log
This is helpful in our case for optimizing the distributed flash-attention implementation. Our work: [BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences](https://arxiv.org/abs/2403.09347) benefits from this PR.
## BurstAttention and Ulyless all2all support for long sequence training. ### Issue Reference N/A ### Description 1. Add BurstAttention as distributed ring_flash attention implementation. 2. Add all2all communication ops for...
**Describe the solution you'd like** When I configure a custom command , provide a key:value for prompting some message. In a lot of case such as rename container/images save images/container...