Support PyPI-installed `nvidia-cuda-runtime-cu12` and `nvidia-cublas-cu12`
Is your feature request related to a problem? Please describe. PyTorch is able to install its CUDA dependencies via the above wheels during pip-install. Adding these wheels to the dependencies of the prebuilt CUDA llama-cpp-python wheels would be convenient. Requires a slight change to import logic.
Describe the solution you'd like https://github.com/abetlen/llama-cpp-python/blob/5212fb08ae69a721b2ced4e5e8b96ce642219e16/llama_cpp/llama_cpp.py#L65
As a workaround, I pasted this at the above empty line:
import nvidia.cublas
import nvidia.cuda_runtime
ctypes.CDLL(os.path.join(nvidia.cublas.__path__[0], "lib", "libcublas.so.12"), mode=ctypes.RTLD_GLOBAL)
ctypes.CDLL(os.path.join(nvidia.cuda_runtime.__path__[0], "lib", "libcudart.so.12"), mode=ctypes.RTLD_GLOBAL)
This resolved the prior so issues like RuntimeError: Failed to load shared library '/workspaces/app/.venv/lib/python3.11/site-packages/llama_cpp/libllama.so': libcublas.so.12: cannot open shared object file: No such file or directory.
Of course, for the above, it probably should be conditional on if the system platform is linux. That said, PyTorch is able to pull off the import on both windows & linux (the nvidia wheels are available for both platforms). It is probably somewhere in the pytorch source code.