Rajdeep Borgohain comments

Results 6 comments of


                                            Rajdeep Borgohain

gptSessionBenchmark Failed Because of " Assertion failed: d == a + length " with 0.7.1 Release in tritonserver:23.12-trtllm-python-py3 Image

@taozhang9527 What settings worked for you? Do you know if the previous version worked? Or did you copy the file (from @wjj19950828 suggestion)? ->I don't know where to get that...

v0.8.0 KeyError: 'builder_config' when benchmarking with new versions config.json

I am trying to test the engine file with run.py, getting the same error. Model: Llama-2 7B Chat

langchain_nvidia_trt not working

Please share the configuration in the TensorRT-LLM end. What are the parameters modification required in the model's config.pbtxt

Add a cookbook on how to send batch requests with vLLM

Two ways of serving models using vLLM: 1. Online Mode: Which usage the OpenAI Client and expose the API's for text generation. Also you can create async request for token...

why online seving slower than offline serving??

> Hello @irasin, is there some new thoughts on this issue? I encounter the same thing, the speed is ~0.49 of the offline batch in tps. Much appreciated for any...

why online seving slower than offline serving??

> I think it's slower due to internet latency. > […](#) > On Mon, 15 Apr 2024, 15:48 Sam Comber, ***@***.***> wrote: +1 have observed this also, currently just living...