Jiadi Su
Results
1
comments of
Jiadi Su
The first image token is the prediction of the last segmentation token, and the prediction of the last token is not needed, so the transformer's input can be 512 dimensional...