bit issues

Model not binary

2

When the final model is saved, I checked the weights and found out that they were not binary. Also, the models that were attached in last row of each table...

Fariha3197

Reproduction Issue of BiT on STS-B

2

Great work and thanks a lot for opening up the great work! While reusing the code released, I found some issues below: I can not reproduce the W1A1 version BiT...

NicoNico6

Problem in reproduce multi-distillation approach

12

Hello, Thank you for providing code. I can get the right results of W1A1 with `bash scripts/run_glue.sh MNLI` (around 77 accuracy on MNLI) But when i reproduce the W1A1 with...

kongds

Was Two Stage Knowledge Distillation used as in BinaryBERT?

Was Two Stage Knowledge Distillation used as in BinaryBERT in Table 7 (https://arxiv.org/pdf/2012.15701.pdf) to get these results?

Phuoc-Hoan-Le

How many epochs do you train with data augmentation?

In the paper, you mentioned how many epochs you trained without data augmentation. However, I am not sure if you use the same number of epochs when training with data...

Phuoc-Hoan-Le

In transformer/modeling_bert_quant.py line 229, the LearnableBias is not involved in calculation

ThisisBillhe

StopIteration encountered running MNLI

Hi there! I got an StopIteration when I was trying to follow the steps to run your code, `scripts/run_glue.sh`: ``` ... previous messages hidden... 2024-05-13 16:17:00,172 [INFO]: module.classifier: Linear(in_features=768, out_features=3,...

CanYing0913

bit
bit copied to clipboard

Metadata

Model not binary

Reproduction Issue of BiT on STS-B

Problem in reproduce multi-distillation approach

Was Two Stage Knowledge Distillation used as in BinaryBERT?

How many epochs do you train with data augmentation?

In transformer/modeling_bert_quant.py line 229, the LearnableBias is not involved in calculation

StopIteration encountered running MNLI

← Metadata

Owner

Metadata

bit bit copied to clipboard

Metadata

Model not binary

Reproduction Issue of BiT on STS-B

Problem in reproduce multi-distillation approach

Was Two Stage Knowledge Distillation used as in BinaryBERT?

How many epochs do you train with data augmentation?

In transformer/modeling_bert_quant.py line 229, the LearnableBias is not involved in calculation

StopIteration encountered running MNLI

← Metadata

Owner

Metadata

bit
bit copied to clipboard