<think> tags for thinking models
Thinking models like DeepSeek-R1 emit thinking tags in the output. Is there a way to filter these out easily? Currently they make it directly into the output and so mess up metrics.
Nope, not at the moment! We used to have regex parsers but they were underused so we removed them.
I had the same question and got something working. For now it's more of a hack, but hopefully this is a starting point to get it working generally. Here is my branch and demo notebook.
Notes:
- could add a flag (similar to
use_chat_template). I saw some code uses add_reasoning_prompt -
<think>is already in tokenizer_config.json'schat_templatefield, so maybe instead of hardcoding, possible to detect common tokens there, and then insert it in the chat template again? - I needed to change my answer options from
["A", "B"...]to["The answer is A", ...] - 2048 tokens seems like the right size
- didn't test few-shot
- this isn't maximally efficient because it works on one doc at a time, but I don't know how well large batches would fit
Here's the key section in prompt_manager:
elif use_chat_template:
chat_preview = self.model.tokenizer.apply_chat_template(
output, tokenize=False, add_generation_prompt=True
)
tokenized = self.model.tokenizer(chat_preview, return_tensors="pt").to(self.model.device)
prepared_batch = Batch(
input_ids=tokenized["input_ids"],
input_mask=tokenized["attention_mask"],
input_lengths=[len(tokenized["input_ids"][0])],
truncated=[False],
padded=[False],
)
response = self.model._generate(
batch=prepared_batch,
max_new_tokens=2048,
stop_tokens=["</think>"],
)
all_start = chat_preview + response[0].result[0] + "</think>"
return all_start, num_effective_fewshots
Hi! This is now fixed on main after this PR. The details also contain both the original prediction and the post processed one.