llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Segmentation fault when converting embeddings into tensor

Open devashishraj opened this issue 11 months ago • 0 comments

trying to convert embedding into tensor leads to Segmentation fault:


System Info

  • Physical (or virtual) hardware you are using, e.g. for Linux:
> sysctl -a | grep machdep.cpu
machdep.cpu.cores_per_package: 10
machdep.cpu.core_count: 10
machdep.cpu.logical_per_package: 10
machdep.cpu.thread_count: 10
machdep.cpu.brand_string: Apple M2 Pro

  • Operating System, e.g. for Linux:

macos Sequia 15.3.1 (24D70)

  • SDK version, e.g. for Linux:
Python 3.13.2
GNU Make 3.81
Apple clang version 16.0.0 (clang-1600.0.26.6)
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

code

import logging
import torch
from llama_cpp import Llama
from rich.console import Console

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
console = Console(width=120)

embpath = "all-MiniLM-L6-v2-ggml-model-f16.gguf"
embedModel = Llama(model_path=embpath,embedding=True,verbose=True)

# test embedding model
query = ["Test sentence"]
try:
    embeds = embedModel.embed(input=query)
    print(embeds)
    genAns_tensor = torch.tensor(embeds)
    
    del embedModel
except Exception as e:
    print("Embedding error:", e)

code works for only creating embeddings ( i.e if i remove the tensor conversion part and just print the embedding)

logs

llama_kv_cache_init: kv_size = 512, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 6, can_shift = 1
llama_kv_cache_init: layer 0: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 1: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 2: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 3: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 4: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init: layer 5: n_embd_k_gqa = 384, n_embd_v_gqa = 384
llama_kv_cache_init:      Metal KV buffer size =     4.50 MiB
llama_init_from_model: KV self size  =    4.50 MiB, K (f16):    2.25 MiB, V (f16):    2.25 MiB
llama_init_from_model:        CPU  output buffer size =     0.00 MiB
llama_init_from_model:      Metal compute buffer size =    17.00 MiB
llama_init_from_model:        CPU compute buffer size =     3.50 MiB
llama_init_from_model: graph nodes  = 221
llama_init_from_model: graph splits = 2
Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | MATMUL_INT8 = 1 | DOTPROD = 1 | MATMUL_INT8 = 1 | ACCELERATE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
Model metadata: {'tokenizer.ggml.cls_token_id': '101', 'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.seperator_token_id': '102', 'tokenizer.ggml.unknown_token_id': '100', 'tokenizer.ggml.token_type_count': '2', 'general.file_type': '1', 'tokenizer.ggml.eos_token_id': '102', 'bert.context_length': '512', 'bert.pooling_type': '1', 'tokenizer.ggml.bos_token_id': '101', 'bert.attention.head_count': '12', 'bert.feed_forward_length': '1536', 'tokenizer.ggml.mask_token_id': '103', 'tokenizer.ggml.model': 'bert', 'bert.attention.causal': 'false', 'general.name': 'all-MiniLM-L6-v2', 'bert.block_count': '6', 'bert.attention.layer_norm_epsilon': '0.000000', 'bert.embedding_length': '384', 'general.architecture': 'bert'}
Using fallback chat format: llama-2
Fatal Python error: Segmentation fault

Thread 0x0000000204908840 (most recent call first):
  File "/Users/devashishraj/Desktop/localRAG/lrag/lib/python3.13/site-packages/llama_cpp/_internals.py", line 306 in decode
  File "/Users/devashishraj/Desktop/localRAG/lrag/lib/python3.13/site-packages/llama_cpp/llama.py", line [1]    63839 segmentation fault  PYTHONFAULTHANDLER=1 python3 -X dev embeddingTest.py

venv package list

pip list
Package                  Version
------------------------ -----------
aiohappyeyeballs         2.4.4
aiohttp                  3.11.10
aiosignal                1.3.2
annotated-types          0.7.0
anyio                    4.7.0
attrs                    24.3.0
beautifulsoup4           4.12.3
certifi                  2024.12.14
charset-normalizer       3.4.0
dataclasses-json         0.6.7
diskcache                5.6.3
faiss-cpu                1.9.0.post1
filelock                 3.17.0
frozenlist               1.5.0
fsspec                   2025.2.0
gpt4all                  2.8.2
h11                      0.14.0
httpcore                 1.0.7
httpx                    0.28.1
httpx-sse                0.4.0
huggingface-hub          0.28.1
idna                     3.10
Jinja2                   3.1.5
joblib                   1.4.2
jsonpatch                1.33
jsonpointer              3.0.0
langchain                0.3.12
langchain-community      0.3.12
langchain-core           0.3.33
langchain-ollama         0.2.3
langchain-text-splitters 0.3.3
langsmith                0.2.3
llama_cpp_python         0.3.7
markdown-it-py           3.0.0
MarkupSafe               3.0.2
marshmallow              3.23.1
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.1.0
mypy-extensions          1.0.0
networkx                 3.4.2
numpy                    2.2.3
ollama                   0.4.7
orjson                   3.10.12
packaging                24.2
pillow                   11.1.0
pip                      25.0.1
propcache                0.2.1
pydantic                 2.10.3
pydantic_core            2.27.1
pydantic-settings        2.7.0
Pygments                 2.18.0
PyMuPDF                  1.25.1
python-dotenv            1.0.1
PyYAML                   6.0.2
regex                    2024.11.6
requests                 2.32.3
requests-toolbelt        1.0.0
rich                     13.9.4
safetensors              0.5.2
scikit-learn             1.6.1
scipy                    1.15.1
sentence-transformers    3.4.1
setuptools               75.8.0
sniffio                  1.3.1
soupsieve                2.6
SQLAlchemy               2.0.36
sympy                    1.13.1
tenacity                 9.0.0
threadpoolctl            3.5.0
tiktoken                 0.8.0
tokenizers               0.21.0
torch                    2.6.0
tqdm                     4.67.1
transformers             4.48.3
typing_extensions        4.12.2
typing-inspect           0.9.0
urllib3                  2.2.3
yarl                     1.18.3

devashishraj avatar Feb 17 '25 14:02 devashishraj