codellama
codellama copied to clipboard
How to add tokens to tokenizer?
I want to add some tokens like [BOST] to the tokenizer so that it does not split these.
How can I achieve this? Any suggestions are welcome.
Huggingface provides functions like add_tokens but I want to make other changes in the source, so I don't want to use HF.
tokenizer.add_special_tokens(["[BOST]"])