mayani-nv
mayani-nv
This experiment was done as a part of Model-analyzer integration with onnxruntime's OLIVE tool. The ask was to see how can the ORT hyper-parameters(backends, precision etc.) can be sweeped using...
@askhade I tried with Yolov2 onnx model and the Openvino backend seems to be working fine. It is only with the BERT onnx model that this error persists. Also, I...
@tanmayv25 thank you for the suggestion. So for the ort-cpu only backend, providing `-z` option helped and I am getting the following ``` /perf_analyzer -m bert_onnx_cpu -z --concurrency-range 4 ***...
I tried running the above tests with Triton v21.09 container and am ORT-TRT-Triton with FP32 enabled and getting following ``` Concurrency: 1, throughput: 0.8 infer/sec, latency 1252700 usec Concurrency: 2,...
@pranavsharma The config which you gave and is generated by Triton is for `max_batch_size=0` as you can see on the line 10 of your `config.json`. While this works if you...
The outputs `yolonms_layer_1/ExpandDims_1:0` and other outputs do support dynamic batch as shown by the dummy alphanumeric variables. That's why the error you posted is confusing me as well that if...
@tanmayv25 is the `batch` can be changed as `-1` using `polygraphy`? ``` $ python3 -m pip install polygraphy $ polygraphy surgeon sanitize -o --override-input-shapes input:[-1,3,height,width] ```
would this [sample](https://github.com/noamgat/lm-format-enforcer/blob/main/samples/colab_trtllm_integration.ipynb) help?