Jiadi Su

Results 1 comments of Jiadi Su

The first image token is the prediction of the last segmentation token, and the prediction of the last token is not needed, so the transformer's input can be 512 dimensional...