taming-transformers icon indicating copy to clipboard operation
taming-transformers copied to clipboard

Why is last token in the sequence removed

Open MartaTintore opened this issue 4 years ago • 3 comments

I was wondering if you could give an explanation of why the last token in the sequence is dropped in the cond_transformer.py script, the paper does not give an explanation for that. Thanks!

https://github.com/CompVis/taming-transformers/blob/9d17ea64b820f7633ea6b8823e1f78729447cb57/taming/models/cond_transformer.py#L100

MartaTintore avatar Jun 29 '21 12:06 MartaTintore

same question

zhangyingbit avatar Feb 21 '23 11:02 zhangyingbit

Auto-regress learning need to remove the last one. For example. for sequence [a, b, c, d, e], decoder's input is [bos, a, b, c, d, e], the desired output is [a, b, c, d, e, eos]. Here is no eos, so just remove last one element 'e', and we get decoder input [bos, a, b, c, d], and desired output [a, b, c, d, e]

JJJYmmm avatar Mar 26 '24 12:03 JJJYmmm

The first image token is the prediction of the last segmentation token, and the prediction of the last token is not needed, so the transformer's input can be 512 dimensional indices. However, the predict result should be selected starting from the last token prediction of the segmentation.

JoyBoy-Su avatar Apr 01 '24 07:04 JoyBoy-Su