mlx-examples What is the difference between mlx model and hugging face model?

What is the difference between mlx model and hugging face model? I notice there is the weight file *.npz, is this file a part of mlx model, if I want to deploy the mlx model, should I include this file?

Jan 29 '24 03:01 592319702

There are different implementation models available. It may use either PyTorch or Hugging Face Transformers implementation. However, if you use the models from llm_lm, they are implemented with Hugging Face Transformers implementation. That's why you can directly load the Hugging Face models (not quantized) using mlx-lm library.

In terms of deployment, I assume you want to host the fine-tuned model for some inference task. You can use mlx-lm library to do inference. If you want to use other tools like TGI or llama.cpp, you have to use fuse.py to merge your adapter and convert it back to de-quant model weights. Then follow the tools of your preference for doing inference there.

Jan 30 '24 06:01 mzbac

There are different implementation models available. It may use either PyTorch or Hugging Face Transformers implementation. However, if you use the models from llm_lm, they are implemented with Hugging Face Transformers implementation. That's why you can directly load the Hugging Face models (not quantized) using mlx-lm library.

In terms of deployment, I assume you want to host the fine-tuned model for some inference task. You can use mlx-lm library to do inference. If you want to use other tools like TGI or llama.cpp, you have to use fuse.py to merge your adapter and convert it back to de-quant model weights. Then follow the tools of your preference for doing inference there.

If I didn't use convert.py to quant the model, I think there is no need to convert it back to de-quant model weights in the last step.

Feb 05 '24 02:02 592319702