bit
bit copied to clipboard
Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer
When the final model is saved, I checked the weights and found out that they were not binary. Also, the models that were attached in last row of each table...
Great work and thanks a lot for opening up the great work! While reusing the code released, I found some issues below: I can not reproduce the W1A1 version BiT...
Hello, Thank you for providing code. I can get the right results of W1A1 with `bash scripts/run_glue.sh MNLI` (around 77 accuracy on MNLI) But when i reproduce the W1A1 with...
Was Two Stage Knowledge Distillation used as in BinaryBERT in Table 7 (https://arxiv.org/pdf/2012.15701.pdf) to get these results?
In the paper, you mentioned how many epochs you trained without data augmentation. However, I am not sure if you use the same number of epochs when training with data...
Hi there! I got an StopIteration when I was trying to follow the steps to run your code, `scripts/run_glue.sh`: ``` ... previous messages hidden... 2024-05-13 16:17:00,172 [INFO]: module.classifier: Linear(in_features=768, out_features=3,...