Lan Gong
Lan Gong
Using Github Action is one option. We can explore other options (like probot) to do the same inter-repository trigger. Regarding the motivation: This is to have an e2e CI build...
Hi @tanmayv25 Thanks for your reply. Our post-process service (using KServe) uses open inference protocol, so there's no way for it to accept FP16 output from Triton. However, after some...
Hi @tanmayv25 What I meant earlier by "there's no way for it to accept FP16 output from Triton" is that the open inference protocol does not support fp16 type, i.e.,...
Hi @tanmayv25 > You don't have to serialize(deserialize) FP16 to raw byte format. Within Triton, onnx backend will directly write the FP16 tensor data into the protobuf message repeated bytes...
Hi @tanmayv25 Thank you for your confirmation. I need to check with the downstream service about supporting `raw_output_content` as `raw_input_content` when dtype is fp16. For now I have implemented a...
Hi @fpetrini15 thank you for your reply! I have the follow-up questions below: 1. I have tried explicitly setting `intra_op_thread_count = ` (the number of maximum cpu cores allowed for...
Update: The issue with sidecar CPU throttling has been resolved by increasing CPU cores and memories for the sidecar container due to the large input size of image tensors. However,...
Removed "sidecar" from the issue title as it is a separate issue. The open issue is CPU throttling with the main container after configuring ONNX op thread count.
For Linux users, if you see an error like this: ``` &"warning: GDB: Failed to set controlling terminal: Operation not permitted\n" &"Cannot create process: Operation not permitted\n" ``` make sure...