Can multiple threads share a GptSession? Can multiple threads call GptSession::generate() concurrently?
As a service, requests need to be processed concurrently. Can multiple threads share a GptSession? Can multiple threads call GptSession::generate() concurrently?
For "requests processed concurrently", this function should be supported in gptManager with inflight batching. What's your consideration to use multi-threads on GptSession?
For "requests processed concurrently", this function should be supported in
gptManagerwith inflight batching. What's your consideration to use multi-threads onGptSession?
Thank you for your response. In my application scenario, I hope to process requests concurrently and obtain contextLogits. The 0.7.0 version of gptManager does not seem to have an interface for obtaining contextLogits.
Hi @yifeihappy , the main branch now supports obtaining contextLogits under gptManager, related docs are here. You could get from SendResponseCallback, such as here, response_tensors will contain contextLogits.
Hi @yifeihappy , the main branch now supports obtaining
contextLogitsunder gptManager, related docs are here. You could get fromSendResponseCallback, such as here,response_tensorswill containcontextLogits.
Thank you for your answer! I look forward to bringing this feature to the next tag release.