Ivan Mihić comments

Results 5 comments of


                                            Ivan Mihić

ERROR: stderr update_slots : failed to find free space in the KV cache

It seems that this error also happens if we enable parallel llama.cpp processing. For an example, setting the context size to 8192 and the number of parallel processes to 20,...

Unable to launch triton server with TP

@dhruvmullick I'm facing the same problem on my multi-GPU server with 4x L40S. Have you managed to solve it?

Unable to launch triton server with TP

@thakkar2804 With Lenovo servers I had this problem resolved by disabling all virtualization (CPU, IOMMU, PCIe virtualization) and running the server and GPUs as bare metal. See [https://github.com/NVIDIA/TensorRT-LLM/issues/2305](https://github.com/NVIDIA/TensorRT-LLM/issues/2305)

request is blocked and non output when using tensor parallelism with multi gpus

I am having a similar problem. My machine is Lenovo SR675v3, with total of 8 available GPU adapters and I have 4 of them populated with Nvidia L40S. Running Llama...

[new] discord channel for tensorrt

@geraldstanje please recreate server invite, thanks