TensorRT-LLM Segmentation fault (core dumped)

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700 [TensorRT-LLM][INFO] Engine version 0.10.0.dev2024050700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] MPI size: 1, rank: 0 [TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 6 is not available. [TensorRT-LLM][WARNING] Device 0 peer access Device 7 is not available. [TensorRT-LLM][INFO] Loaded engine size: 14495 MiB [TensorRT-LLM][ERROR] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory) [TensorRT-LLM][WARNING] Requested amount of GPU memory (13908726432 bytes) could not be allocated. There may not be enough free memory for allocation to succeed. [TensorRT-LLM][ERROR] 2: [safeDeserialize.cpp::load::269] Error Code 2: OutOfMemory (no further information) [01d177e8bded:22501] *** Process received signal *** [01d177e8bded:22501] Signal: Segmentation fault (11) [01d177e8bded:22501] Signal code: Address not mapped (1) [01d177e8bded:22501] Failing at address: 0x8 [01d177e8bded:22501] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f02d1a61420] [01d177e8bded:22501] [ 1] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime11TllmRuntimeC2EPKvmRN8nvinfer17ILoggerE+0x19d)[0x7f01372db78d] [01d177e8bded:22501] [ 2] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSessionC2ERKNS1_6ConfigERKNS0_11ModelConfigERKNS0_11WorldConfigEPKvmSt10shared_ptrIN8nvinfer17ILoggerEE+0x395)[0x7f0137288d25] [01d177e8bded:22501] [ 3] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xc5459)[0x7f01ab0f0459] [01d177e8bded:22501] [ 4] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x71b99)[0x7f01ab09cb99] [01d177e8bded:22501] [ 5] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x54a6c)[0x7f01ab07fa6c] [01d177e8bded:22501] [ 6] python3[0x4fc697] [01d177e8bded:22501] [ 7] python3(_PyObject_MakeTpCall+0x25b)[0x4f614b] [01d177e8bded:22501] [ 8] python3[0x50819f] [01d177e8bded:22501] [ 9] python3(PyVectorcall_Call+0xb9)[0x508bb9] [01d177e8bded:22501] [10] python3[0x50607f] [01d177e8bded:22501] [11] python3[0x4f64b6] [01d177e8bded:22501] [12] /opt/conda/envs/tensorRT/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x540d9)[0x7f01ab07f0d9] [01d177e8bded:22501] [13] python3(_PyObject_MakeTpCall+0x25b)[0x4f614b] [01d177e8bded:22501] [14] python3(_PyEval_EvalFrameDefault+0x5757)[0x4f26f7] [01d177e8bded:22501] [15] python3[0x507eae] [01d177e8bded:22501] [16] python3(PyObject_Call+0xb8)[0x508858] [01d177e8bded:22501] [17] python3(_PyEval_EvalFrameDefault+0x2b79)[0x4efb19] [01d177e8bded:22501] [18] python3[0x591d92] [01d177e8bded:22501] [19] python3(PyEval_EvalCode+0x87)[0x591cd7] [01d177e8bded:22501] [20] python3[0x5c2967] [01d177e8bded:22501] [21] python3[0x5bdad0] [01d177e8bded:22501] [22] python3[0x45956b] [01d177e8bded:22501] [23] python3(_PyRun_SimpleFileObject+0x19f)[0x5b805f] [01d177e8bded:22501] [24] python3(_PyRun_AnyFileObject+0x43)[0x5b7dc3] [01d177e8bded:22501] [25] python3(Py_RunMain+0x38d)[0x5b4b7d] [01d177e8bded:22501] [26] python3(Py_BytesMain+0x39)[0x584e49] [01d177e8bded:22501] [27] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f02d1725083] [01d177e8bded:22501] [28] python3[0x584cfe] [01d177e8bded:22501] *** End of error message *** Segmentation fault (core dumped)

May 09 '24 03:05 LIUKAI0815

Please follow the template to share the environment and the reproduced steps.

May 10 '24 08:05 byshiue

It is getting OOM ([TensorRT-LLM][ERROR] 2: [safeDeserialize.cpp::load::269] Error Code 2: OutOfMemory (no further information)), could you please share your system or VM configurations

May 21 '24 16:05 Tushar-ml

@LIUKAI0815 this issue pops up when your mpi cuda built is set: False, you have to enable gpu aware in openmpi config. for linux: conda install -c conda-forge mpi4py openmpi

set arguments directly in terminal: export OMPI_MCA_opal_cuda_support=true export OMPI_MCA_pml=ucx export OMPI_MCA_osc=ucx export UCX_MEMTYPE_CACHE=n

check if gpu for mpi is enabled: ompi_info --parsable --all | grep mpi_built_with_cuda_support:value

Sep 09 '24 12:09 gokarbido

currently running chatrtx with local Llama2-13b and Mistral-7b models in linux/Ubuntu 24.04. system specs: Nvidia RTX 4070ti SUPER 16 GB, 64 GB RAM and Intel® Core™ i7-14700K × 28. I have followed this repo: https://github.com/noahc1510/trt-llm-rag-linux

Sep 09 '24 12:09 gokarbido

Hi @LIUKAI0815 do u still have further issue or question now? If not, we'll close it soon.

Nov 14 '24 03:11 nv-guomingz