cortex.cpp bug: nitro attempts to load nonexistent "ggml-model-f16.gguf" model

Describe the bug Nitro responds with {"message":"Failed to load model"} with the following request:

curl http://localhost:3928/inferences/llamacpp/loadmodel -d '{
  "llama_model_path": "/model/starling-lm-7b-alpha.Q6_K.gguf",
  "ctx_len": 512,
  "ngl": 100,
}'

To Reproduce Steps to reproduce the behavior:

Download Nitro 0.3.16 on Windows
Start the server

Run this in another terminal:

curl http://localhost:3928/inferences/llamacpp/loadmodel -d '{
  "llama_model_path": "/model/starling-lm-7b-alpha.Q6_K.gguf",
  "ctx_len": 512,
  "ngl": 100,
}'

See error

Expected behavior Model loads successfully

Desktop (please complete the following information):

OS: Windows 10
Version 22H2

Additional context Nitro logs:

20240326 16:01:48.911000 UTC 4228 INFO  Server started, listening at: 127.0.0.1:3928 - main.cc:54
20240326 16:01:48.911000 UTC 4228 INFO  Please load your model - main.cc:55
20240326 16:01:48.911000 UTC 4228 INFO  Number of thread is:10 - main.cc:62
{"timestamp":1711468910,"level":"INFO","function":"LoadModelImpl","line":650,"message":"system info","n_threads":5,"total_threads":10,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | "}
llama_model_load: error loading model: failed to open models/7B/ggml-model-f16.gguf: No such file or directory
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/7B/ggml-model-f16.gguf'
{"timestamp":1711468910,"level":"ERROR","function":"load_model","line":535,"message":"unable to load model","model":"models/7B/ggml-model-f16.gguf"}
20240326 16:01:50.815000 UTC 13252 ERROR Error loading the model - llamaCPP.cc:654

This is the important line:

llama_model_load: error loading model: failed to open models/7B/ggml-model-f16.gguf: No such file or directory

Why does it attempt to load this model, and how do I generate it if it doesn't exist?

Mar 26 '24 16:03 zeozeozeo

@zeozeozeo ~this is the default model, but you can use other models with -m MODEL_FILE instead. The server will not start if the path is incorrect.~

Apr 01 '24 20:04 shavit

@zeozeozeo this is the default model, but you can use other models with -m MODEL_FILE instead. The server will not start if the path is incorrect.

hm, so to load new models I need to restart the nitro server each time?

Apr 01 '24 20:04 zeozeozeo

Sorry, I got this error from server not nitro. The nitro server will start without a model argument, just make sure to use the absolute path to the file in your request.

Apr 01 '24 21:04 shavit

I'm seeing the same issue. Regardless of what path I specify with /inferences/llamacpp/loadmodel it attempts to load the default model.

Apr 02 '24 00:04 smathews

Hi @zeozeozeo, sorry for the late response. Since your OS is Windows, the llama_model_path is a bit difference.

For example, here is my model path: "C:\Users\UserName\Downloads\nitro-win-amd64-avx2-cuda-11-7\llama-2-7b-model.gguf" Then here is the correct request JSON to load model on Windows:

curl http://localhost:3928/inferences/llamacpp/loadmodel -d '{
  "llama_model_path": "C:\\Users\\UserName\\Downloads\\nitro-win-amd64-avx2-cuda-11-7\\llama-2-7b-model.gguf",
  "ctx_len": 512,
  "ngl": 100,
}'

FYI, the string models/7B/ggml-model-f16.gguf is the default model alias from llama.cpp.

Apr 09 '24 13:04 CameronNg

Could you check if this error message is still relevant in the new Cortex version? @vansangpfiev

Jun 06 '24 08:06 louis-jan

This should be address after Jan is migrated to Cortex backend instead of Nitro. @vansangpfiev to double check and close this issue :)

Jun 11 '24 01:06 freelerobot

closing, pls reopen if still occuring.

Jul 01 '24 05:07 freelerobot