aPaleBlueDot
aPaleBlueDot
Is there any way to bring in a Q4_1 quantized GGUF into MLX Swift? Or to specify q4_1 in mlx 's own quantization?
More broadly speaking, my use case requires quantizations other than the 2/4/8 bit options currently provided, especially Q3, Q5 and Q6 flavors. Are there any future plans to add more...
👀 I'd greatly appreciate this feature, as it'll solve the app crashes.
@narner Are there at least some embedding models which worked?
Thanks @narner , that's still a good set of choices. Do you use a reranker, if so with which other framework? It seems llama.cpp has the broadest model support, and...
@narner Nice! ping me upon launching it