[Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported
Currently, sensevoice's trt engine can be successfully converted through trtexec, but when running the benchmark infer, an error message is displayed as shown below:
ORT can be used to successfully predict the corresponding ONNX. The code is as follows, indicating that ONNX is fine
import onnxruntime
import torch
option = onnxruntime.SessionOptions()
option.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
option.intra_op_num_threads = 1
providers = [
"CUDAExecutionProvider"
if torch.cuda.is_available() else "CPUExecutionProvider"
]
model = onnxruntime.InferenceSession(
"model_sensevoice.onnx",
sess_options=option, providers=providers)
batch_size = 4
feats_length = 256
speech = torch.randn(batch_size, feats_length, 560).cuda()
speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()
ort_inputs = {
'speech': speech.cpu().numpy(),
'speech_lengths': speech_lengths.cpu().numpy(),
'language': language.cpu().numpy(),
'textnorm': textnorm.cpu().numpy(),
}
output = model.run(None, ort_inputs)[0]
print("output:", output, output.shape)
trtexec convert:
trtexec \
--onnx=model_sensevoice.onnx \
--saveEngine=engine_fp16.plan \
--minShapes=speech:1x128x560,speech_lengths:1,language:1,textnorm:1 \
--optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
--maxShapes=speech:8x512x560,speech_lengths:8,language:8,textnorm:8 \
--fp16 \
--builderOptimizationLevel=3 \
--memPoolSize=workspace:4096 \
--verbose
TRT version:TensorRT-10.7.0.23 ONNX version: 1.17.0
So what is the specific reason? Thank you~
Try to use follow
trtexec \
--onnx=model_sensevoice.onnx \
--saveEngine=engine_fp16.plan \
--minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
--optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
--maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
--fp16 \
--builderOptimizationLevel=3 \
--memPoolSize=workspace:4096 \
--verbose
Try to use follow
trtexec
--onnx=model_sensevoice.onnx
--saveEngine=engine_fp16.plan
--minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--fp16
--builderOptimizationLevel=3
--memPoolSize=workspace:4096
--verbose
@lix19937 Thanks for your reply~
Currently, since both bs dimension and feats_length require dynamic shapes, is it reasonable to set them to the same shape?
export onnx through follow:
def export_dynamic_axes(self):
return {
"speech": {0: "batch_size", 1: "feats_length"},
"speech_lengths": {0: "batch_size"},
"language": {0: "batch_size"},
"textnorm": {0: "batch_size"},
"ctc_logits": {0: "batch_size", 1: "logits_length"},
"encoder_out_lens": {0: "batch_size"},
}
So are there any other solutions?
Thanks~
@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input.
Can you try providing the same inputs as ORT using --loadInputs flag in trtexec?
The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).
@wjj19950828 does the
speech_lengthinput need to have its last dimension befeats_length? If yes, the trtexec command you have provided sets random values to thespeech_lengthinput. Can you try providing the same inputs as ORT using--loadInputsflag in trtexec? The--loadInputsflag accepts binary files for each input. You can save a numpy array to a binary file usingnp_array.tofile(arr.bin).
@asfiyab-nvidia Thanks for your reply!
After using --loadInputs, inference runs benchamark without any problems, as shown below:
But when I run TRT using the following script, there is still an error. What is the reason? Thanks~ script:
import tensorrt as trt
import torch
import numpy as np
TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
engine_filepath = 'engine_fp16.plan'
# input
batch_size = 4
feats_length = 256
# speech = torch.randn(batch_size, feats_length, 560, dtype=torch.float32).cuda()
# speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
# language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
# textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()
speech = torch.Tensor(np.fromfile('speech.bin', dtype=np.float32).reshape(batch_size, feats_length, 560)).cuda()
speech_lengths = torch.Tensor(np.fromfile('speech_lengths.bin', dtype=np.int32).reshape(batch_size,)).cuda()
language = torch.Tensor(np.fromfile('language.bin', dtype=np.int32).reshape(batch_size,)).cuda()
textnorm = torch.Tensor(np.fromfile('textnorm.bin', dtype=np.int32).reshape(batch_size,)).cuda()
# output
ctc_logits = torch.empty(batch_size, feats_length + 4, 25055, dtype=torch.float32).cuda()
encoder_out_lens = torch.empty(batch_size, dtype=torch.int32).cuda()
with open(engine_filepath, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
context.set_input_shape('speech', (batch_size, feats_length, 560))
context.set_input_shape('speech_lengths', (batch_size,))
context.set_input_shape('language', (batch_size,))
context.set_input_shape('textnorm', (batch_size,))
bindings = [speech.data_ptr(), speech_lengths.data_ptr(), language.data_ptr(), textnorm.data_ptr(), ctc_logits.data_ptr(), encoder_out_lens.data_ptr()]
for i in range(len(bindings)):
print("name:", engine.get_tensor_name(i))
context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
handle = torch.cuda.current_stream().cuda_stream
context.execute_async_v3(stream_handle=handle)
print('all_binding_shapes_specified: ', context.all_binding_shapes_specified)
print('ctc_logits shape: ', context.get_tensor_shape('ctc_logits'))
print('encoder_out_lens', context.get_tensor_shape('encoder_out_lens'))
print('ctc_logits: ', ctc_logits)
print('encoder_out_lens: ', encoder_out_lens)
error:
@wjj19950828
modify
speech_lengths = torch.Tensor(np.fromfile('speech_lengths.bin', dtype=np.int32).reshape(batch_size,)).cuda()
language = torch.Tensor(np.fromfile('language.bin', dtype=np.int32).reshape(batch_size,)).cuda()
textnorm = torch.Tensor(np.fromfile('textnorm.bin', dtype=np.int32).reshape(batch_size,)).cuda()
to
speech_lengths = torch.tensor(np.fromfile('speech_lengths.bin', dtype=np.int32)).reshape(batch_size,).cuda()
language = torch.tensor(np.fromfile('language.bin', dtype=np.int32)).reshape(batch_size,).cuda()
textnorm = torch.tensor(np.fromfile('textnorm.bin', dtype=np.int32)).reshape(batch_size,).cuda()
请问你用tensorrt 推理成功了吗 @wjj19950828