tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

When add post process, Tokenizer cannot save

Open andy-yangz opened this issue 5 years ago • 1 comments

At first, I have a Tokenizer that works very well. I saved it. But later I want to add an PostProcessor to it, and I write a template. It just works well. However, when I save it I get an error.

from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
tokenizer.post_processor = TemplateProcessing(
    single="[CLS] $A [SEP]",
    pair="[CLS] $A [SEP] $B:1 [SEP]:1",
    special_tokens=[
        ("[CLS]", 1),
        ("[SEP]", 2),
    ],
)

tokenizer.save("tokenizer.new.json")

The error message

thread '<unnamed>' panicked at 'no entry found for key', /__w/tokenizers/tokenizers/tokenizers/src/models/mod.rs:36:66
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyo3_runtime.PanicException: no entry found for key

I don't know it's a bug or an unimplemented feature. if it's unimplemented, I wish this could become a feature. It will be help a lot, if we can add postprocessor later.

andy-yangz avatar Jan 27 '21 13:01 andy-yangz

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Apr 24 '24 01:04 github-actions[bot]