repeng rm token position

it was a good idea to mask here, but it breaks on all kinds of models e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking" and I can't work out how to fix it easily (even using the attention mask is complicated as some models reshape the hidden state and so on). It might be worth disabling it.

Sep 21 '25 06:09 wassname

Am I correct if I think that this code is only relevant if we use batch generation?

Sep 21 '25 10:09 thiswillbeyourgithub

Perhaps, in truth, I haven't pinned down exactly the cases where this happens, or the best solution. I think some models might add position_ids even with batch_size==1, but I'm not sure

Sep 21 '25 21:09 wassname

Another approach here would be to take the attention mask from the inputs if present (rather than working it out from the position_ids if present) but this also leads to shape errors in some models.

I'm not even sure we need to mask the padding tokens if an attention mask is provided, so perhaps this section can be safely removed.

Sep 21 '25 21:09 wassname