BitNet icon indicating copy to clipboard operation
BitNet copied to clipboard

Parts of the BitLinear code doesn't match paper (before bit1.58)

Open qqqllppp opened this issue 1 year ago • 2 comments

Referencing this paper: https://arxiv.org/pdf/2310.11453.pdf Code part: https://github.com/kyegomez/BitNet/blob/984ec72c2a45a88b739c85668690fe1abbdf3152/bitnet/bitlinear.py

In general, it seems that the code does not match the paper, mainly Equation (1), (4) and (11). It also seems to be missing the straight-through estimator? (edit: the code also didn't replace bitlinear within the multihead attention)

I also found this other reference implementation which seems to follow the equations from the paper a bit more. https://github.com/Beomi/BitNet-Transformers

qqqllppp avatar Mar 01 '24 15:03 qqqllppp

@qqqllppp this repo is still in progress, if you notice defects pls send a pull request

kyegomez avatar Mar 01 '24 16:03 kyegomez

Stale issue message

github-actions[bot] avatar May 01 '24 12:05 github-actions[bot]