serving Assertion `batchSize > 0' failed, when deploy the tf-trt int8 optimization model

System information

TensorFlow Serving installed from (source or binary): tensorflow/serving:1.15.0-gpu
TensorFlow Serving version: 1.15.0
PY_VERSION: 3.6
CUDA_VERSION: 10.0.
CUDNN_VERSION: 7
TensorRT: 5.1.5

Describe the problem

I use tf serving to deploy a tf-trt int8 optimization model on a t4 nvidia card with some warm up request data. Then I got this bug "Assertion `batchSize > 0' failed".

Exact Steps to Reproduce

I can upload some code if needed

Source code / logs

This is the log:

2022-06-29 09:41:27.542443: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 6986645 microseconds.
2022-06-29 09:41:27.587142: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:117] Starting to read warmup data for model at /workspace/repository/savedmodel/group_7436/bert-int8-test/23/assets.extra/tf_serving_warmup_requests with model-warmup-options
2022-06-29 09:41:49.342088: I external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_0 input shapes: [[128]]
2022-06-29 09:41:49.665296: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output. tensorflow_model_server: regionFormat.cpp:56: size_t nvinfer1::RegionFormatB::memorySize(int, const nvinfer1::Dims&) const: Assertion `batchSize > 0' failed.

When I use the same tf-trt int8 optimization model on offline prediction, it works fine.

2022-06-29T09:11:29.852812877Z 2022-06-29 09:11:29.852690: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_0 input shapes: [[640]]
2022-06-29T09:11:29.852866360Z 2022-06-29 09:11:29.852799: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.5
2022-06-29T09:11:29.856544909Z 2022-06-29 09:11:29.856491: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.5
2022-06-29T09:11:31.201361074Z 2022-06-29 09:11:31.201237: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_1 input shapes: [[5,128,768], [5,128,768]] ...

The strange thing is the deploying of tf-trt FP16 optimization model works fine. Here is the log:

2022-06-29 10:08:27.949798: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:117] Starting to read warmup data for model at /workspace/repository/savedmodel/group_7436/bert-int8-test/25/assets.extra/tf_serving_warmup_requests with model-warmup-options
2022-06-29 10:08:49.022089: I external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_0 input shapes: [[128]]
2022-06-29 10:08:49.331724: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:37] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output.
2022-06-29 10:08:50.570585: I external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for bert/embeddings/TRTEngineOp_1 input shapes: [[1,128,768], [1,128,768], [1,128,768]]

Jun 29 '22 13:06 helenHlz

Hi @helenHlz

In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thank you!

Jul 14 '22 04:07 pindinagesh

Hi @helenHlz

In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thank you!

For tftrt 1.15 trouble-shooting
Intro
We attempted to use tftrt to do int8 quantization for tf saved model. Offline inference is no problem. However, I get an error when deploying an int8 model for online inference using tf serving-1.15. You can view the detailed error log in the "saved_model/int8-tftrt/int8_deploy_log.txt" file. https://drive.google.com/file/d/1cx1xUxaSbz1m6cCA15jNcEwydRrs9vGM/view?usp=sharing Since there is no problem with offline inference, I think the error is caused by tf serving.
The strange thing is that fp16 tftrt can be deployed using tf serving. The problem only occurs with the int8 model.
How to get saved model
You can get the fp32 , fp16 and int8 saved models and deployment logs from this google drive link.
https://drive.google.com/drive/folders/1N01T8uyRtt0ffnLdIOZ_Gem8SEekxQgi?usp=sharing
How to reproduce the issue
You could deploy the int8 saved model by tf serving. Then you will meet this problem.
Code
Not sure if it helps, I still share three xx.py files with you.
https://drive.google.com/drive/folders/1YyJnSnNqPUjDyPdSDf9Iwwxai2Sbl3yS?usp=sharing

tftrt_convert.py: convert fp32 saved model to tftrt fp16 saved model
tftrt_int8_convert.py: convert fp32 saved model to tftrt int8 saved model
tftrt_pred_from_savedmodel.py: offline inference for int8 saved model

Related info The model we use is bert base proposed by google https://github.com/google-research/bert.
You can download the vocab.txt and bert_config.json from this link
https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-12_H-768_A-12.zip.
The task is MRCP, one of glue task. https://drive.google.com/drive/folders/14yGZla6X9a4Dihy52gf2E05475Z9Qbaa?usp=sharing

Please let me know if you need more information.
We have a lot of trained models of tf1.15 version, hope to solve this problem without upgrading the version.
Thx.

Jul 14 '22 11:07 helenHlz

@helenHlz,

Compared to FP32 and FP16, INT8 requires additional calibration data to determine the best quantization thresholds. When the precision mode in the conversion parameter is INT8, we need to provide an input function to the convert() method call. This input function is similar to the input function provided to the build() method.

For more info and example implementation, please refer Support for INT8 section. Thank you!

Feb 15 '23 07:02 singhniraj08

This issue was closed due to lack of activity after being marked stale for past 14 days.

Mar 15 '23 01:03 github-actions[bot]