ββπ DeepSeek-v3-0324-8bitββ
The MLX community just dropped the ββ8-bit quantized versionββ of DeepSeek-V3-0324! π ββhttps://huggingface.co/mlx-community/DeepSeek-v3-0324-8bit Need urgent support from the pros! π§
Correct me if I am wrong, but ig you can just edit the exo/models.py and add there.
add "deepseek-v3-0324-8bit": { "layers": 61, "repo": { "MLXDynamicShardInferenceEngine": "mlx-community/DeepSeek-v3-0324-8bit", }, }, in the model_cards
and add "deepseek-v3-0324-8bit": "Deepseek V3-0324 (8-bit)", in pretty_name
If the model supports MLXDynamicShardInferenceEngine, it should work directly.
try it, if it works do a pull request to add it
I used two Mac Studios with Apple M3 Ultra chips, each with 512GB of unified memory (1TB total Unified Memory).
I also modified exo/models.py, just as you recommended.
Both Macs are running, and each one is using about 380GB of memory.
However, an error occurred during inference. I suspect that modifying only exo/models.py is not enough β there may be other parts of the code that also need to be changed, but I'm not able to handle that part.
There's already a issue for this error( #799 ), still not yet solved though π ! It's related to tensor type bf16, even the one in the models.py ain't working for him Can you try running the model mentioned in the issue?
I tested model #799 as well, and it also gave an error, but the error was different.
when did you git pull the repo? if its old, see this commit. they removed that statement.
I downloaded the version a few days ago, but I just compared it with the latest version, and there haven't been any content changes recently.
I downloaded it, changed the models.py across both machines. Currently it's stuck on "Checking download status..." and won't give me an error or load the model into memory.