bug: nitro attempts to load nonexistent "ggml-model-f16.gguf" model
Describe the bug
Nitro responds with {"message":"Failed to load model"} with the following request:
curl http://localhost:3928/inferences/llamacpp/loadmodel -d '{
"llama_model_path": "/model/starling-lm-7b-alpha.Q6_K.gguf",
"ctx_len": 512,
"ngl": 100,
}'
To Reproduce Steps to reproduce the behavior:
- Download Nitro 0.3.16 on Windows
- Start the server
- Run this in another terminal:
curl http://localhost:3928/inferences/llamacpp/loadmodel -d '{ "llama_model_path": "/model/starling-lm-7b-alpha.Q6_K.gguf", "ctx_len": 512, "ngl": 100, }' - See error
Expected behavior Model loads successfully
Desktop (please complete the following information):
- OS: Windows 10
- Version 22H2
Additional context Nitro logs:
20240326 16:01:48.911000 UTC 4228 INFO Server started, listening at: 127.0.0.1:3928 - main.cc:54
20240326 16:01:48.911000 UTC 4228 INFO Please load your model - main.cc:55
20240326 16:01:48.911000 UTC 4228 INFO Number of thread is:10 - main.cc:62
{"timestamp":1711468910,"level":"INFO","function":"LoadModelImpl","line":650,"message":"system info","n_threads":5,"total_threads":10,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | "}
llama_model_load: error loading model: failed to open models/7B/ggml-model-f16.gguf: No such file or directory
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/7B/ggml-model-f16.gguf'
{"timestamp":1711468910,"level":"ERROR","function":"load_model","line":535,"message":"unable to load model","model":"models/7B/ggml-model-f16.gguf"}
20240326 16:01:50.815000 UTC 13252 ERROR Error loading the model - llamaCPP.cc:654
This is the important line:
llama_model_load: error loading model: failed to open models/7B/ggml-model-f16.gguf: No such file or directory
Why does it attempt to load this model, and how do I generate it if it doesn't exist?
@zeozeozeo ~this is the default model, but you can use other models with -m MODEL_FILE instead. The server will not start if the path is incorrect.~
@zeozeozeo this is the default model, but you can use other models with
-m MODEL_FILEinstead. The server will not start if the path is incorrect.
hm, so to load new models I need to restart the nitro server each time?
Sorry, I got this error from server not nitro. The nitro server will start without a model argument, just make sure to use the absolute path to the file in your request.
I'm seeing the same issue. Regardless of what path I specify with /inferences/llamacpp/loadmodel it attempts to load the default model.
Hi @zeozeozeo, sorry for the late response.
Since your OS is Windows, the llama_model_path is a bit difference.
For example, here is my model path: "C:\Users\UserName\Downloads\nitro-win-amd64-avx2-cuda-11-7\llama-2-7b-model.gguf"
Then here is the correct request JSON to load model on Windows:
curl http://localhost:3928/inferences/llamacpp/loadmodel -d '{
"llama_model_path": "C:\\Users\\UserName\\Downloads\\nitro-win-amd64-avx2-cuda-11-7\\llama-2-7b-model.gguf",
"ctx_len": 512,
"ngl": 100,
}'
FYI, the string models/7B/ggml-model-f16.gguf is the default model alias from llama.cpp.
Could you check if this error message is still relevant in the new Cortex version? @vansangpfiev
This should be address after Jan is migrated to Cortex backend instead of Nitro. @vansangpfiev to double check and close this issue :)
closing, pls reopen if still occuring.