KeyError: 'size' and more

Open subfish-zhou opened this issue 3 years ago • 1 comments

Hello! I found this model on HuggingFace, it looks well but I can't run it. When I ran the pytorch example you gave in your doc it raise an error as below: (TF version is fine, but it needs pt weights)

Some weights of the model checkpoint at tbs17/MathBERT were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "D:\...\lib\site-packages\transformers\tokenization_utils_base.py", line 250, in __getattr__
    return self.data[item]
KeyError: 'size'
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "...\test.py", line 33, in <module>
    output = model(encoded_input)
  File "...\test.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "...\test.py", line 944, in forward
    input_shape = input_ids.size()
  File "...\test.py", line 252, in __getattr__
    raise AttributeError
AttributeError

Apr 16 '22 11:04 subfish-zhou

I have the same problem too and the same situation for TF version. PyTorch example from HugginFace can be fixed with the following changes:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained('tbs17/MathBERT')
model = AutoModel.from_pretrained('tbs17/MathBERT')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')["input_ids"]
output = model(encoded_input)
output

You just need to add ["input_ids"] argument to your encoded_input.

May 28 '22 10:05 DmitriiOGU