Use special tokens specific to the fine-tuned adapter during decoding
During fine-tuning, it's possible that special tokens are added that are specific to the adapter. During decoding, we should be using the special tokens, and ensure the correct stop tokens, padding, etc. are properly honored.
Repro from @runvnc, related: #68
Model ID: https://huggingface.co/qblocks/mistral_7b_norobots/tree/main
QLoRA repo example uses this AutoTokenizer with special tokens:
https://github.com/artidoro/qlora/blob/7f4e95a68dc076bea9b3a413d2b512eca6d004e5/qlora.py#L347
Will this be completed? Planning to use adapters with special tokens like the below ones: https://huggingface.co/Dogge/llama-3-8B-instruct-Bluemoon-Freedom-lora/ https://huggingface.co/Dogge/llama-3-70B-instruct-uncensored-lora