TransnormerLLM
TransnormerLLM copied to clipboard
Differences between Lightning Attention1 and Lightning Attention2 code implementations
hello, I have two questions I’d like to ask:
- In this repository, I noticed that the implementations of lightning attention1 and lightning attention2 appear identical
- The implementation of lightning attention2 in this repository differs from the code provided at this GitHub link(https://github.com/OpenNLPLab/lightning-attention). By testing the computational efficiency of these two implementations, I found that this repository’s version of lightning attention2 has lower computational efficiency than the one from that GitHub link.
- For Lightning Attention 1, you can refer to Appendix B of paper. In short, Lightning Attention 1 is a version of Flash Attention without Softmax.
- The implementation of Lightning Attention in this repository is slightly different from repo, so it's reasonable that repois faster.
I hope this helps.
Thank you very much for your patience in addressing my questions! I have two additional questions I’d like to ask:
- Is this repo an implementation of Lightning Attention 2?
- This repository and this repo are both the implementations of Lightning Attention 2. Based on my understanding, with identical input, both implementations should produce the same output. However, after testing, I’ve found that they actually yield different results with identical input. Additionally, I noticed that the implementation of Lightning Attention 2 in this repo achieves higher computational efficiency. Is there a specific reason why the TransnormerLLM model doesn’t use this more efficient implementation?
Thank you again for your valuable time and insight, and I look forward to your response.
- Yes, this repository implements LightningAttention2.
- Regarding the lightning attention in this repo, I would like to confirm if you are referring to this file.
- Yes, I am referring to this file. In this repo, the content in lightning_attention2.py is same as the content in lightning_attention.py.
Ok, I'll review it within the next couple of days and get back to you later.
Ok, Thank you very much for your patience in addressing my questions!