Bert custom embedding: could not load model: rpc error: code = Unknown desc = failed loading model
LocalAI version: 2.19.3
Environment, CPU architecture, OS, and Version: Win 11, AMD Ryzen 5 4500 6-Core Processor, RTX 3090.
Describe the bug I am trying to use a custom embedding model. File is in gguf, the type of model is bert. My YAML file looks like :
f16: true
gpu_layers: 40
name: ItaLegalEmb
backend: bert-embeddings
embeddings: true
parameters:
model: ItaLegalEmb
The file is ItaLegalEmb.gguf and it's correctly placed in the model folder. When calling the model
curl --location 'http://127.0.0.1:8080/v1/embeddings' \
--header 'Content-Type: application/json' \
--data '{
"input": "Test",
"model": "ItaLegalEmb"
}'
The answer is
"could not load model: rpc error: code = Unknown desc = failed loading model"
The debug shows
1:45PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/bert-embeddings
1:45PM DBG GRPC Service for ItaLegalEmb will be running at: '127.0.0.1:37719'
1:45PM DBG GRPC Service state dir: /tmp/go-processmanager1685794175
1:45PM DBG GRPC Service Started
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr 2024/07/31 13:45:07 gRPC Server listening at 127.0.0.1:37719
1:45PM DBG GRPC Service Ready
1:45PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:ItaLegalEmb ContextSize:512 Seed:194720499 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:true NUMA:false NGPULayers:40 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/ItaLegalEmb Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr bert_load_from_file: failed to open '/build/models/ItaLegalEmb'
1:45PM DBG GRPC(ItaLegalEmb-127.0.0.1:37719): stderr bert_bootstrap: failed to load model from '/build/models/ItaLegalEmb'
1:45PM ERR Server error error="could not load model: rpc error: code = Unknown desc = failed loading model"
Additional context
- I tested the model on the same pc (same hw) but with a different platform as LM Studio, so the same gguf file, and worked correctly.
- In the same LocalAI installation I am using all-MiniLM-L6-v2 without any issue. Also LLM models without issues.
having the same problem on ubuntu 22.04
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.