yifeihappy comments

Repositories
Issues
Comments

Results 2 comments of


                                            yifeihappy

Can multiple threads share a GptSession? Can multiple threads call GptSession::generate() concurrently?

> For "requests processed concurrently", this function should be supported in `gptManager` with inflight batching. What's your consideration to use multi-threads on `GptSession`? Thank you for your response. In my...

Can multiple threads share a GptSession? Can multiple threads call GptSession::generate() concurrently?

> Hi @yifeihappy , the main branch now supports obtaining `contextLogits` under gptManager, related docs are [here](https://github.com/NVIDIA/TensorRT-LLM/issues/926). You could get from `SendResponseCallback`, such as [here](https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/cpp/gptManagerBenchmark.cpp#L405), `response_tensors` will contain `contextLogits`. Thank...