Rajdeep Borgohain
Rajdeep Borgohain
@taozhang9527 What settings worked for you? Do you know if the previous version worked? Or did you copy the file (from @wjj19950828 suggestion)? ->I don't know where to get that...
I am trying to test the engine file with run.py, getting the same error. Model: Llama-2 7B Chat
Please share the configuration in the TensorRT-LLM end. What are the parameters modification required in the model's config.pbtxt
Two ways of serving models using vLLM: 1. Online Mode: Which usage the OpenAI Client and expose the API's for text generation. Also you can create async request for token...
> Hello @irasin, is there some new thoughts on this issue? I encounter the same thing, the speed is ~0.49 of the offline batch in tps. Much appreciated for any...
> I think it's slower due to internet latency. > […](#) > On Mon, 15 Apr 2024, 15:48 Sam Comber, ***@***.***> wrote: +1 have observed this also, currently just living...