Gadflyii

Results 12 comments of Gadflyii

PR #16310 added new param "--no-host" to disable host buffer to allow extra buffers (Repack + AMX) to enable AMX acceleration on CPU layers when GPU(s) is present.

> You can try [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) which does not use AMX in CPU+GPU hybrid unless something changed in the past few weeks. They are still using the upstream engine from lllama.cpp...

> > You can try [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) > > i can't find a way to compile this using cuda 12.8 in linux too, rtx 50 series require minimum cuda 12.8 I...

Any plans to implement AMX then? I am a big fan of IK_llama.cpp; but have not ran it in a long time. I could try to do a run of...

I have a fork of llama.cpp they enables AMX in hybrid environments, grants about a 20-30% increase in the CPU inference portions of offloaded models if you want to try...

If you are running a hybrid CPU/GPU setup, try my llama.cpp fork, it works with amx int8/bf16 (I don’t have a 6th gen to try Int4). https://github.com/Gadflyii/llama.cpp Build as usual...

Did you build with all the AMX flags? You need: -DGGML_NATIVE=ON -DGGML_CUDA=ON -DGGML_AMX_TILE=ON -DGGML_AMX_INT8=ON -DGGML_AMX_BF16=ON “—AmX” is only used as a flag at run as part of the command: Example:...

I enabled discussions, I really appreciate you helping me test it. On Sat, Sep 13, 2025 at 1:17 AM ***@***.***> wrote: > Did you build with all the AMX flags?...

What doesn’t work correctly? I don’t think it is fair to say the project is dead, they made a major commit just last week to add support for qwen-next.

I hope they fix it; this is a really cool project.