Sai Munikoti

Results 2 comments of Sai Munikoti

> @saimunikoti actually, with multimodal, clip sucks at text-to-text retrieval. > > What is happening here is, images are embedded and retrieved with clip, and text is embedded and retrieved...

Can we expect mPLUG-2 codebase to be released in a month or so for the academic use ?