feat: add yayi2-30b example
Yayi2-30b has a score of over 80 MMLU. I performed some fine-tuning on it using qlora, and from my quick test, it appears to be very promising. so, I created the mlx example for this model in case anyone wants to run it via mlx. However, this model has an unusual k,v layer that causes the quantization to fail. Currently, I haven't found any quantization tools that support this model (except for bnb nf4). It would be great if mlx could provide support for its quantization.
FYI: https://huggingface.co/wenge-research/yayi2-30b https://huggingface.co/mzbac/yayi2-30b-guanaco https://github.com/ml-explore/mlx/issues/328
I have added the workaround for the quant as suggested in https://github.com/ml-explore/mlx/issues/328. Now, the example works with 4-bit quantization. Once the PR gets merged, I will upload the 4-bit quantized model.
https://github.com/ml-explore/mlx-examples/assets/7523197/3284eb6d-7d7d-4eab-88db-755fee196305
HI @mzbac sorry for the delayed review here. Do you still want to merge this? I think given the non-standard size it wouldn't fit easily in our hf_llm example, but wdyt?
I think this should be supported by hf_llm once we fix the quant non-32 dimension. I'm happy to close this one. Meanwhile, if people want to try this model in f16 precision, they should be able to run it via hf_llm.
Sounds good, thank you!