Significant code overlap between BitNet and T-MAC, what are the specific differences?

Open DRAn-An opened this issue 1 year ago • 1 comments

Hello, After thoroughly reviewing the source code of both BitNet and T-MAC, I noticed a high degree of overlap between the two. The code implementation seems quite similar, which raises some questions for me: What are the specific differences between BitNet and T-MAC in terms of architecture, algorithms, or optimization strategies? Are there any unique improvements or distinct use cases for each? I would appreciate it if you could clarify the distinctions between them.

Oct 22 '24 08:10 DRAn-An

Thanks for the question. T-Mac introduces the lookup table methods for low bits model inference, which is generally capable for models such as 1-bit, 2-bits, 4 bits and so on, and the look up table contains values grouped of 2^1, 2^2 and 2^4. On the other hand, Bitnet is a ternary weights model, that every weight has three possible values -1 0 1, which makes it possible to group the values by 3^n, to further reduce the model size to nearly b1.58 log2(3). so in TL1 and TL2 the values are specifically grouped with ternary weights to achieve better performance. Another issue is that we found only bitnet kernels can output exact same tokens due to lossless inference compared to fp32 format inference. Detailed explanations can be found via https://arxiv.org/pdf/2410.16144

Oct 22 '24 13:10 sd983527