YueWeng

Results 5 comments of YueWeng

Hi @yifeihappy , the main branch now supports obtaining `contextLogits` under gptManager, related docs are [here](https://github.com/NVIDIA/TensorRT-LLM/issues/926). You could get from `SendResponseCallback`, such as [here](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/cpp/gptManagerBenchmark.cpp#L405), `response_tensors` will contain `contextLogits`.

@MrBurmark Thanks for your reply, it really helps!!! I use gcc 5.5.0 and cuda 10.1 this time, and the above problems did not occur. But I still get a lot...

Hi @Marks101 @vnkc1 , thank you for your feedback. This memory usage is expected. The reason for twice the amount of GPU memory for logits is because: - The new...

Hi @metterian , thanks for your feedback. Are the performance data you show based on triton? If so, could you please try to use only TRT-LLM (not based on triton)...