FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

ValueError: Asking to pad but the tokenizer does not have a padding token.

Open sdgahsdnrne opened this issue 2 years ago • 5 comments

ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})。请问,这个需要在哪个py文件中修改啊

sdgahsdnrne avatar May 30 '23 07:05 sdgahsdnrne

While, I met the same problem, is there any solution?

libeineu avatar May 30 '23 12:05 libeineu

same issue 👍

Logophoman avatar May 30 '23 12:05 Logophoman

I tried out this stackoverflow post but it didn't help me, but maybe it helps you?

Logophoman avatar May 30 '23 12:05 Logophoman

This fixed it for my code:

dataset = dataset.map(lambda samples: tokenizer.__call__(samples["text"]), batched=True)

tokenizer.pad_token = tokenizer.eos_token

tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model.resize_token_embeddings(len(tokenizer))

trainer = transformers.Trainer(...)

The key for my issue was to add the special tokens and to use the call method 🤔

Logophoman avatar May 30 '23 12:05 Logophoman

This fixed it for my code:

dataset = dataset.map(lambda samples: tokenizer.__call__(samples["text"]), batched=True)

tokenizer.pad_token = tokenizer.eos_token

tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model.resize_token_embeddings(len(tokenizer))

trainer = transformers.Trainer(...)

The key for my issue was to add the special tokens and to use the call method 🤔

This modification leads to a silghtly worse generation quality as mentioned in #1058 . Still don't know how to deal with it.

Artanic30 avatar Jun 07 '23 10:06 Artanic30