tokenizers
tokenizers copied to clipboard
When add post process, Tokenizer cannot save
At first, I have a Tokenizer that works very well. I saved it. But later I want to add an PostProcessor to it, and I write a template. It just works well. However, when I save it I get an error.
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.post_processor = TemplateProcessing(
single="[CLS] $A [SEP]",
pair="[CLS] $A [SEP] $B:1 [SEP]:1",
special_tokens=[
("[CLS]", 1),
("[SEP]", 2),
],
)
tokenizer.save("tokenizer.new.json")
The error message
thread '<unnamed>' panicked at 'no entry found for key', /__w/tokenizers/tokenizers/tokenizers/src/models/mod.rs:36:66
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pyo3_runtime.PanicException: no entry found for key
I don't know it's a bug or an unimplemented feature. if it's unimplemented, I wish this could become a feature. It will be help a lot, if we can add postprocessor later.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.