Publish the latest llama.cpp?
Hello, I run an AMD card and there have been very significant ROCm support updates (flash attention, quants, massive speed improvements) since the llama.cpp version currently in llama-cpp-python.
Could you do us a big one and publish a new llama-cpp-python with the latest llama.cpp? It would be much appreciated! Thank you!
+1 would love to see an update to the latest llama.cpp
Up until a couple of weeks ago, the bindings were still close enough that you could pull the upstream llama.cpp in the vendor directory and build locally. It looks like there's a breaking change in the contract for libllama - llama_model_load_from_file got renamed to llama_load_model_from_file.