yifeihappy
yifeihappy
> For "requests processed concurrently", this function should be supported in `gptManager` with inflight batching. What's your consideration to use multi-threads on `GptSession`? Thank you for your response. In my...
> Hi @yifeihappy , the main branch now supports obtaining `contextLogits` under gptManager, related docs are [here](https://github.com/NVIDIA/TensorRT-LLM/issues/926). You could get from `SendResponseCallback`, such as [here](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/cpp/gptManagerBenchmark.cpp#L405), `response_tensors` will contain `contextLogits`. Thank...