Ivan Mihić

Results 5 comments of Ivan Mihić

It seems that this error also happens if we enable parallel llama.cpp processing. For an example, setting the context size to 8192 and the number of parallel processes to 20,...

@dhruvmullick I'm facing the same problem on my multi-GPU server with 4x L40S. Have you managed to solve it?

@thakkar2804 With Lenovo servers I had this problem resolved by disabling all virtualization (CPU, IOMMU, PCIe virtualization) and running the server and GPUs as bare metal. See [https://github.com/NVIDIA/TensorRT-LLM/issues/2305](https://github.com/NVIDIA/TensorRT-LLM/issues/2305)

I am having a similar problem. My machine is Lenovo SR675v3, with total of 8 available GPU adapters and I have 4 of them populated with Nvidia L40S. Running Llama...

@geraldstanje please recreate server invite, thanks