feat: support add tokens to tokenizer.

Open congchan opened this issue 2 years ago • 0 comments

To improve the compatibility of various models initialized from different open-sourced models, people may want to add some tokens for better downstream tuning purposes.

For example, to improve our policy's adherence to our chat format, we may want to add ChatML tokens such as "<|system|>", "<|assistant|>", "<|user|>", and "<|end|>" to the policy tokenizer.

Adding special tokens is ignored by the decode phase of the PPO. This is because it needs to skip certain special tokens, such as EOS tokens. Therefore, Will only add normal tokens.

Jun 06 '23 14:06 congchan