rm token position
it was a good idea to mask here, but it breaks on all kinds of models e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking" and I can't work out how to fix it easily (even using the attention mask is complicated as some models reshape the hidden state and so on). It might be worth disabling it.
Am I correct if I think that this code is only relevant if we use batch generation?
Perhaps, in truth, I haven't pinned down exactly the cases where this happens, or the best solution. I think some models might add position_ids even with batch_size==1, but I'm not sure
Another approach here would be to take the attention mask from the inputs if present (rather than working it out from the position_ids if present) but this also leads to shape errors in some models.
I'm not even sure we need to mask the padding tokens if an attention mask is provided, so perhaps this section can be safely removed.