Olga Andreeva comments

Results 91 comments of


                                            Olga Andreeva

Torchscript backend MUCH slower only with FP16

We've identified potential cause: a CPU overhead for small batch sizes causes FP16 model to be slower than FP32. More on this issue can be found here: https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/ We will...

enh: Trace to capture the child models invoked from BLS

This feature will be supported starting from 23.08 release

[Metrics] Triton server should expose software version / build number as a prometheus metric

Hi @Kellel , thanks for suggestion! I'll file a feature request for our team.

Premature shutdown of model during graceful shutdown

Thank you @jsoto-gladia for reporting this issue, I filed a ticket for our team to investigate.

Premature shutdown of model during graceful shutdown

I believe this issue asks us to make sure that during graceful shutdown of Triton Inference Server, we properly handle in-flight requests, i.e. instead of returning an error to the...

[Question] Is it possible to have inputs of ensemble models with dimension=1 defined in the config

Hi @MatthieuToulemont , have you tried specifying [`parameters`](https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#parameters) for trt model in the `config.pbtxt`? For example: https://github.com/triton-inference-server/server/blob/d6bd668cf2208ef70d951182f0fda7d5a7e21c82/docs/examples/model_repository/simple_dyna_sequence/config.pbtxt#L90-L95