mayani-nv comments

Results 8 comments of


                                            mayani-nv

Error in onnxruntime-openvino backend when run with Triton

This experiment was done as a part of Model-analyzer integration with onnxruntime's OLIVE tool. The ask was to see how can the ORT hyper-parameters(backends, precision etc.) can be sweeped using...

Error in onnxruntime-openvino backend when run with Triton

@askhade I tried with Yolov2 onnx model and the Openvino backend seems to be working fine. It is only with the BERT onnx model that this error persists. Also, I...

Error in onnxruntime-openvino backend when run with Triton

@tanmayv25 thank you for the suggestion. So for the ort-cpu only backend, providing `-z` option helped and I am getting the following ``` /perf_analyzer -m bert_onnx_cpu -z --concurrency-range 4 ***...

Triton-OnnxRt- TRT performance i

I tried running the above tests with Triton v21.09 container and am ORT-TRT-Triton with FP32 enabled and getting following ``` Concurrency: 1, throughput: 0.8 infer/sec, latency 1252700 usec Concurrency: 2,...

Yolov3 onnx model not load

@pranavsharma The config which you gave and is generated by Triton is for `max_batch_size=0` as you can see on the line 10 of your `config.json`. While this works if you...

Yolov3 onnx model not load

The outputs `yolonms_layer_1/ExpandDims_1:0` and other outputs do support dynamic batch as shown by the dummy alphanumeric variables. That's why the error you posted is confusing me as well that if...

Batch Support Error Triton ONNX Backend

@tanmayv25 is the `batch` can be changed as `-1` using `polygraphy`? ``` $ python3 -m pip install polygraphy $ polygraphy surgeon sanitize -o --override-input-shapes input:[-1,3,height,width] ```

[Feature Request] Support for Constrained Decoding (such as generating Json formatted output)

would this [sample](https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_trtllm_integration.ipynb) help?