LocalAI Bert custom embedding: could not load model: rpc error: code = Unknown desc = failed loading model

LocalAI version: 2.19.3

Environment, CPU architecture, OS, and Version: Win 11, AMD Ryzen 5 4500 6-Core Processor, RTX 3090.

Describe the bug I am trying to use a custom embedding model. File is in gguf, the type of model is bert. My YAML file looks like :

f16: true
gpu_layers: 40
name: ItaLegalEmb
backend: bert-embeddings
embeddings: true
parameters:
  model: ItaLegalEmb

The file is ItaLegalEmb.gguf and it's correctly placed in the model folder. When calling the model

curl --location 'http://127.0.0.1:8080/v1/embeddings' \
--header 'Content-Type: application/json' \
--data '{
    "input": "Test",
    "model": "ItaLegalEmb"
}'

The answer is

"could not load model: rpc error: code = Unknown desc = failed loading model"

The debug shows

1:45PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings
1:45PM DBG GRPC Service for ItaLegalEmb will be running at: '127.0.0.1:37719'
1:45PM DBG GRPC Service state dir: /tmp/go-processmanager1685794175
1:45PM DBG GRPC Service Started
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr 2024/07/31 13:45:07 gRPC Server listening at 127.0.0.1:37719
1:45PM DBG GRPC Service Ready
1:45PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:ItaLegalEmb ContextSize:512 Seed:194720499 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:true NUMA:false NGPULayers:40 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/ItaLegalEmb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr bert_load_from_file: failed to open '/build/models/ItaLegalEmb'
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr bert_bootstrap: failed to load model from '/build/models/ItaLegalEmb'
1:45PM ERR Server error error="could not load model: rpc error: code = Unknown desc = failed loading model"

Additional context

I tested the model on the same pc (same hw) but with a different platform as LM Studio, so the same gguf file, and worked correctly.
In the same LocalAI installation I am using all-MiniLM-L6-v2 without any issue. Also LLM models without issues.

Jul 31 '24 13:07 IzzyHibbert

having the same problem on ubuntu 22.04

Sep 21 '24 21:09 rxcca

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Jul 22 '25 02:07 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Jul 27 '25 02:07 github-actions[bot]