mlx-examples feat: add yayi2-30b example

Yayi2-30b has a score of over 80 MMLU. I performed some fine-tuning on it using qlora, and from my quick test, it appears to be very promising. so, I created the mlx example for this model in case anyone wants to run it via mlx. However, this model has an unusual k,v layer that causes the quantization to fail. Currently, I haven't found any quantization tools that support this model (except for bnb nf4). It would be great if mlx could provide support for its quantization.

FYI: https://huggingface.co/wenge-research/yayi2-30b https://huggingface.co/mzbac/yayi2-30b-guanaco https://github.com/ml-explore/mlx/issues/328

Dec 31 '23 14:12 mzbac

I have added the workaround for the quant as suggested in https://github.com/ml-explore/mlx/issues/328. Now, the example works with 4-bit quantization. Once the PR gets merged, I will upload the 4-bit quantized model.

https://github.com/ml-explore/mlx-examples/assets/7523197/3284eb6d-7d7d-4eab-88db-755fee196305

Jan 01 '24 03:01 mzbac

HI @mzbac sorry for the delayed review here. Do you still want to merge this? I think given the non-standard size it wouldn't fit easily in our hf_llm example, but wdyt?

Jan 10 '24 04:01 awni

I think this should be supported by hf_llm once we fix the quant non-32 dimension. I'm happy to close this one. Meanwhile, if people want to try this model in f16 precision, they should be able to run it via hf_llm.

Jan 10 '24 04:01 mzbac

Sounds good, thank you!

Jan 10 '24 04:01 awni