ValueError: Asking to pad but the tokenizer does not have a padding token.
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token
(tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})。请问,这个需要在哪个py文件中修改啊
While, I met the same problem, is there any solution?
same issue 👍
I tried out this stackoverflow post but it didn't help me, but maybe it helps you?
This fixed it for my code:
dataset = dataset.map(lambda samples: tokenizer.__call__(samples["text"]), batched=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model.resize_token_embeddings(len(tokenizer))
trainer = transformers.Trainer(...)
The key for my issue was to add the special tokens and to use the call method 🤔
This fixed it for my code:
dataset = dataset.map(lambda samples: tokenizer.__call__(samples["text"]), batched=True) tokenizer.pad_token = tokenizer.eos_token tokenizer.add_special_tokens({'pad_token': '[PAD]'}) model.resize_token_embeddings(len(tokenizer)) trainer = transformers.Trainer(...)The key for my issue was to add the special tokens and to use the call method 🤔
This modification leads to a silghtly worse generation quality as mentioned in #1058 . Still don't know how to deal with it.