codellama icon indicating copy to clipboard operation
codellama copied to clipboard

How to add tokens to tokenizer?

Open abs-xyz opened this issue 2 years ago • 1 comments

I want to add some tokens like [BOST] to the tokenizer so that it does not split these.

How can I achieve this? Any suggestions are welcome.

Huggingface provides functions like add_tokens but I want to make other changes in the source, so I don't want to use HF.

abs-xyz avatar Nov 05 '23 06:11 abs-xyz

tokenizer.add_special_tokens(["[BOST]"])

humza-sami avatar Dec 24 '23 19:12 humza-sami