CLIP
CLIP copied to clipboard
How to change the features obtained by the clip encoder[1, 512]
What exactly does the [1,512] feature obtained by the clip encoder mean, and how does it become a lattice of channels, length, and width?