Krishna Dubba
Krishna Dubba
There might be some other error before this. Maybe the model could not be loaded because of memory limits. Could you check the error log?
Any updates here? I see the same problem when I use this with CachedNetworkImage. I suspect, the cache manager is not being used, so CachedNetworkImage never refreshes the existing cache.
The CPU is working fine for the latest version of the server, but the moment I offload layers to GPU, I get gibberish. Something is messing up in the GPU...
@glaudiston, thanks for the response. I was under the impression if I use "LLAMA_CUBLAS=1 pip install llama-cpp-python", it will take care of finding the .so library.