TransnormerLLM icon indicating copy to clipboard operation
TransnormerLLM copied to clipboard

Differences between Lightning Attention1 and Lightning Attention2 code implementations

Open Hanshifancoder opened this issue 1 year ago • 6 comments

hello, I have two questions I’d like to ask:

  1. In this repository, I noticed that the implementations of lightning attention1 and lightning attention2 appear identical
  2. The implementation of lightning attention2 in this repository differs from the code provided at this GitHub link(https://github.com/OpenNLPLab/lightning-attention). By testing the computational efficiency of these two implementations, I found that this repository’s version of lightning attention2 has lower computational efficiency than the one from that GitHub link.

Hanshifancoder avatar Oct 31 '24 10:10 Hanshifancoder

  1. For Lightning Attention 1, you can refer to Appendix B of paper. In short, Lightning Attention 1 is a version of Flash Attention without Softmax.
  2. The implementation of Lightning Attention in this repository is slightly different from repo, so it's reasonable that repois faster.

I hope this helps.

Doraemonzzz avatar Oct 31 '24 15:10 Doraemonzzz

Thank you very much for your patience in addressing my questions! I have two additional questions I’d like to ask:

  1. Is this repo an implementation of Lightning Attention 2?
  2. This repository and this repo are both the implementations of Lightning Attention 2. Based on my understanding, with identical input, both implementations should produce the same output. However, after testing, I’ve found that they actually yield different results with identical input. Additionally, I noticed that the implementation of Lightning Attention 2 in this repo achieves higher computational efficiency. Is there a specific reason why the TransnormerLLM model doesn’t use this more efficient implementation?

Thank you again for your valuable time and insight, and I look forward to your response.

Hanshifancoder avatar Nov 01 '24 03:11 Hanshifancoder

  1. Yes, this repository implements LightningAttention2.
  2. Regarding the lightning attention in this repo, I would like to confirm if you are referring to this file.

Doraemonzzz avatar Nov 01 '24 03:11 Doraemonzzz

  1. Yes, I am referring to this file. In this repo, the content in lightning_attention2.py is same as the content in lightning_attention.py.

Hanshifancoder avatar Nov 01 '24 06:11 Hanshifancoder

Ok, I'll review it within the next couple of days and get back to you later.

Doraemonzzz avatar Nov 01 '24 06:11 Doraemonzzz

Ok, Thank you very much for your patience in addressing my questions!

Hanshifancoder avatar Nov 01 '24 06:11 Hanshifancoder