Is it possible to use tensorrt to speed up original tensorflow t5 exported saved_model?
i've tried huggingface t5 model speed up by trt, but how can we speed up tensorflow t5 saved_model? i want to use speed-up t5 saved_model in tf-serving for production env. my envirment is:
docker image: nvcr.io/nvidia/tensorflow:22.05-tf2-py3 GPU: Tesla V100 * 2
i followed the tf-trt-user-guide, but it's not work. i first use code:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
SAVED_MODEL_DIR = '/path/to/t5/export/saved_model'
output_saved_model_dir = '/path/to/save/trt/saved_model'
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
precision_mode=trt.TrtPrecisionMode.FP16)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir=SAVED_MODEL_DIR,
conversion_params=conversion_params)
converter.convert()
converter.save(output_saved_model_dir)
it's filed when i use tf.saved_model.load, the error message is
"FAILED_PRECONDITION: Attempting to use uninitialized value"
Then i found t5 saved_model was export by tf1, the i use tf.compat.v1 to convert, code:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
import tensorflow as tf
tf.compat.v1.disable_v2_behavior()
input_saved_model_dir = '/path/to/t5/export/saved_model'
output_saved_model_dir = '/path/to/save/trt/saved_model'
converter = trt.TrtGraphConverter(
input_saved_model_dir=input_saved_model_dir,
max_workspace_size_bytes=(11<32),
precision_mode='FP16',
maximum_cached_engines=100)
converter.convert()
converter.save(output_saved_model_dir)
still faild.
ValueError: Input 0 of node decoder/block_011/layer_002/rms_norm/scale_1/parallel_0_1/Assign was passed float from decoder/block_011/layer_002/rms_norm/scale_slice_0:0 incompatible with expected float_ref.
Could someone can tell: can we use trt to convert tf-t5-saved_model ? If it's possible, how? @DEKHTIARJonathan
I no longer work in TF
I no longer work in TF
sorry, my fault.
never mind, i found solution.
The original saved_model tooks 300ms when batch_size=32 and sen_length=128, it's too long for deploy. So I wanted to speed up t5 by using tf-trt. But when I convert saved_model using below code, tf-trt doesn't work:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
import tensorflow as tf
tf.compat.v1.disable_v2_behavior()
input_saved_model_dir = 'exported_model/batch32_length128_0810/1660123651'
output_saved_model_dir = 'trt_saved_model/batch32_length128_0810/1/'
converter = trt.TrtGraphConverter(
input_saved_model_dir=input_saved_model_dir,
max_workspace_size_bytes=(11<32),
max_batch_size=32,
minimum_segment_size=50,
precision_mode='FP32',
is_dynamic_op=True,
maximum_cached_engines=1)
converter.convert()
converter.save(output_saved_model_dir)
Before using the code, you should add some code in tensorflow/python/compiler/tensorrt/trt_convert.py. The reference is here After add code, the model could convert, but the time still no change. Could some body help me about this?