mlx-examples icon indicating copy to clipboard operation
mlx-examples copied to clipboard

ViT + CLIP

Open gboduljak opened this issue 2 years ago • 2 comments

Would it be worth implementing ViT and CLIP example?

gboduljak avatar Dec 18 '23 19:12 gboduljak

Yea that ones on our list of examples to add! Are you interested in contributing it? If so which model would you use?

awni avatar Dec 18 '23 20:12 awni

Yea that ones on our list of examples to add! Are you interested in contributing it? If so which model would you use?

I would like to contribute :) However, I would like to complete the implementation of norm first (https://github.com/ml-explore/mlx/pull/187). I would use models from the official CLIP repository: https://github.com/openai/CLIP. If you have an alternative idea, please let me know.

gboduljak avatar Dec 18 '23 21:12 gboduljak

@gboduljak I submitted a PR to your existing PR, which creates a local implementation of the CLIPImageProcessor. https://github.com/gboduljak/mlx-examples/pull/1

This should eliminate the dependency on transformers, aside from using it for downloading the model & tokenizer.

nkasmanoff avatar Jan 15 '24 19:01 nkasmanoff

@nkasmanoff Thanks for the help. I will take a look at your work now.

gboduljak avatar Jan 15 '24 23:01 gboduljak

@nkasmanoff I merged your PR, corrected the nits and I refactored your implementation so that everything is in preprocessing folder. Many thanks for the help. In future, we might drop this 'copy-paste' implementation from HuggingFace. Ideally, we should use mlx-data. If you have time, it would be awesome to have mlx-data implementation of CLIPImageProcessor.

gboduljak avatar Jan 16 '24 01:01 gboduljak