ZOUHAN1

Results 1 issues of ZOUHAN1

I would like to understand why masking is used in the text encoder. This doesn't seem necessary for CLIP since it does not perform an autoregressive task. Maybe my understanding...