Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
LocalAI version:
localai/localai:latest-aio-cpu
Environment, CPU architecture, OS, and Version:
cpu
Describe the bug
api_1 | 8:39AM INF [llama-cpp] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF api_1 | 8:39AM INF [llama-ggml] Attempting to load api_1 | 8:39AM INF Loading model with backend llama-ggml api_1 | 8:39AM DBG Loading model in memory from file: /build/models
To Reproduce
Expected behavior
Logs
forceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false} api_1 | 8:39AM INF [llama-cpp] Fails: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF api_1 | 8:39AM INF [llama-ggml] Attempting to load api_1 | 8:39AM INF Loading model with backend llama-ggml api_1 | 8:39AM DBG Loading model in memory from file: /build/models
Additional context
yup same regardless of install method.
same here
Same here
Did this happen with a specific model for you? For me it was command r
deepseek-r1-distill-llama-8b using localai/localai:latest-aio-gpu-nvidia-cuda-12 docker image
After downloading the 44GB image, I am still unable to get this to work.
5:20AM INF Trying to load the model 'deepseek-r1-distill-llama-8b' with the backend '[llama-cpp llama-ggml llama-cpp-fallback stablediffusion-ggml whisper bark-cpp piper stablediffusion silero-vad huggingface /build/backend/python/exllama2/run.sh /build/backend/python/transformers-musicgen/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/diffusers/run.sh /build/backend/python/transformers/run.sh /build/backend/python/bark/run.sh /build/backend/python/vllm/run.sh /build/backend/python/autogptq/run.sh /build/backend/python/mamba/run.sh /build/backend/python/coqui/run.sh /build/backend/python/vall-e-x/run.sh /build/backend/python/sentencetransformers/run.sh /build/backend/python/openvoice/run.sh /build/backend/python/rerankers/run.sh /build/backend/python/parler-tts/run.sh]'
5:20AM INF [llama-cpp] Attempting to load
5:20AM INF Loading model 'deepseek-r1-distill-llama-8b' with backend llama-cpp
WARNING: failed to read int from file: open /sys/class/drm/card0/device/numa_node: no such file or directory
WARNING: error parsing the pci address "simple-framebuffer.0"
5:20AM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
5:20AM INF [llama-cpp] Fails: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
5:20AM INF [llama-ggml] Attempting to load
5:20AM INF Loading model 'deepseek-r1-distill-llama-8b' with backend llama-ggml
5:20AM INF [llama-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unknown desc = failed loading model
5:20AM INF [llama-cpp-fallback] Attempting to load
5:20AM INF Loading model 'deepseek-r1-distill-llama-8b' with backend llama-cpp-fallback
I notice in the logs: failed: out of memory, how ever the needed memory is available .
3:07AM DBG GRPC(intellect-1-instruct-127.0.0.1:37827): stderr ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1344.00 MiB on device 0: cudaMalloc failed: out of memory
3:07AM DBG GRPC(intellect-1-instruct-127.0.0.1:37827): stderr llama_kv_cache_init: failed to allocate buffer for kv cache
3:07AM DBG GRPC(intellect-1-instruct-127.0.0.1:37827): stderr llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
3:07AM DBG GRPC(intellect-1-instruct-127.0.0.1:37827): stderr common_init_from_params: failed to create context with model '/build/models/INTELLECT-1-Instruct-Q4_K_M.gguf'
3:07AM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
@mudler is this the model or the server?
Localai functioncall phi 4 v0.3
LocalAI Version v2.26.0
7:25AM INF [stablediffusion-ggml] Fails: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
7:25AM INF [whisper] Attempting to load
7:25AM INF BackendLoader starting backend=whisper modelID=LocalAI-functioncall-phi-4-v0.3 o.model=localai-functioncall-phi-4-v0.3-q4_k_m.gguf
7:25AM DBG Loading model in memory from file: /build/models/localai-functioncall-phi-4-v0.3-q4_k_m.gguf
7:25AM DBG Loading Model LocalAI-functioncall-phi-4-v0.3 with gRPC (file: /build/models/localai-functioncall-phi-4-v0.3-q4_k_m.gguf) (backend: whisper): {backendString:whisper model:localai-functioncall-phi-4-v0.3-q4_k_m.gguf modelID:LocalAI-functioncall-phi-4-v0.3 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0003a6008 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
7:25AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
7:25AM DBG GRPC Service for LocalAI-functioncall-phi-4-v0.3 will be running at: '127.0.0.1:37653'
7:25AM DBG GRPC Service state dir: /tmp/go-processmanager3335784722
7:25AM DBG GRPC Service Started
7:25AM DBG Wait for the service to start up
7:25AM DBG Options: ContextSize:4096 Seed:435165384 NBatch:512 F16Memory:true MMap:true NGPULayers:99999999 Threads:10
7:25AM DBG GRPC(LocalAI-functioncall-phi-4-v0.3-127.0.0.1:37653): stderr 2025/02/17 07:25:30 gRPC Server listening at 127.0.0.1:37653
7:25AM DBG GRPC Service Ready
7:25AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00075d958} sizeCache:0 unknownFields:[] Model:localai-functioncall-phi-4-v0.3-q4_k_m.gguf ContextSize:4096 Seed:435165384 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:10 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai-functioncall-phi-4-v0.3-q4_k_m.gguf Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]}
7:25AM DBG GRPC(LocalAI-functioncall-phi-4-v0.3-127.0.0.1:37653): stderr whisper_init_from_file_with_params_no_state: loading model from '/build/models/localai-functioncall-phi-4-v0.3-q4_k_m.gguf'
I got deepseek-r1-distill-llama-8b working by removing the /tmp mount.
I think LocalAi isn't unloading the models when the user changes it as a restart makes the model work (most models). We need a unload button and better error handling.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
That's lovely, arbitrary github bot, but you didn't solve the issue.