TensorRT [Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported

Currently, sensevoice's trt engine can be successfully converted through trtexec, but when running the benchmark infer, an error message is displayed as shown below:

ORT can be used to successfully predict the corresponding ONNX. The code is as follows, indicating that ONNX is fine

import onnxruntime
import torch

option = onnxruntime.SessionOptions()
option.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
option.intra_op_num_threads = 1
providers = [
    "CUDAExecutionProvider" 
    if torch.cuda.is_available() else "CPUExecutionProvider"
]
model = onnxruntime.InferenceSession(
    "model_sensevoice.onnx",
    sess_options=option, providers=providers)

batch_size = 4
feats_length = 256
speech = torch.randn(batch_size, feats_length, 560).cuda()
speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()

ort_inputs = {
    'speech': speech.cpu().numpy(),
    'speech_lengths': speech_lengths.cpu().numpy(),
    'language': language.cpu().numpy(),
    'textnorm': textnorm.cpu().numpy(),
}
output = model.run(None, ort_inputs)[0]
print("output:", output, output.shape)

trtexec convert:

trtexec \
    --onnx=model_sensevoice.onnx \
    --saveEngine=engine_fp16.plan \
    --minShapes=speech:1x128x560,speech_lengths:1,language:1,textnorm:1 \
    --optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --maxShapes=speech:8x512x560,speech_lengths:8,language:8,textnorm:8 \
    --fp16 \
    --builderOptimizationLevel=3 \
    --memPoolSize=workspace:4096 \
    --verbose

TRT version：TensorRT-10.7.0.23 ONNX version: 1.17.0

So what is the specific reason? Thank you~

Dec 31 '24 10:12 wjj19950828

Try to use follow


trtexec \
    --onnx=model_sensevoice.onnx \
    --saveEngine=engine_fp16.plan \
    --minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --fp16 \
    --builderOptimizationLevel=3 \
    --memPoolSize=workspace:4096 \
    --verbose

Jan 01 '25 14:01 lix19937

Try to use follow

trtexec
--onnx=model_sensevoice.onnx
--saveEngine=engine_fp16.plan
--minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--fp16
--builderOptimizationLevel=3
--memPoolSize=workspace:4096
--verbose

@lix19937 Thanks for your reply~

Currently, since both bs dimension and feats_length require dynamic shapes, is it reasonable to set them to the same shape?

export onnx through follow:

def export_dynamic_axes(self):
    return {
        "speech": {0: "batch_size", 1: "feats_length"},
        "speech_lengths": {0: "batch_size"},
        "language": {0: "batch_size"},
        "textnorm": {0: "batch_size"},
        "ctc_logits": {0: "batch_size", 1: "logits_length"},
        "encoder_out_lens":  {0: "batch_size"},
    }

So are there any other solutions?

Thanks~

Jan 02 '25 13:01 wjj19950828

@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input. Can you try providing the same inputs as ORT using --loadInputs flag in trtexec? The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).

Jan 02 '25 23:01 asfiyab-nvidia

@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input. Can you try providing the same inputs as ORT using --loadInputs flag in trtexec? The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).

@asfiyab-nvidia Thanks for your reply!

After using --loadInputs, inference runs benchamark without any problems, as shown below:

But when I run TRT using the following script, there is still an error. What is the reason? Thanks~ script:

import tensorrt as trt
import torch
import numpy as np

TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')

engine_filepath = 'engine_fp16.plan'
# input
batch_size = 4
feats_length = 256
# speech = torch.randn(batch_size, feats_length, 560, dtype=torch.float32).cuda()
# speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
# language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
# textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()
speech = torch.Tensor(np.fromfile('speech.bin', dtype=np.float32).reshape(batch_size, feats_length, 560)).cuda()
speech_lengths = torch.Tensor(np.fromfile('speech_lengths.bin', dtype=np.int32).reshape(batch_size,)).cuda()
language = torch.Tensor(np.fromfile('language.bin', dtype=np.int32).reshape(batch_size,)).cuda()
textnorm = torch.Tensor(np.fromfile('textnorm.bin', dtype=np.int32).reshape(batch_size,)).cuda()
# output
ctc_logits = torch.empty(batch_size, feats_length + 4, 25055, dtype=torch.float32).cuda()
encoder_out_lens = torch.empty(batch_size, dtype=torch.int32).cuda()

with open(engine_filepath, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()
    context.set_input_shape('speech', (batch_size, feats_length, 560))
    context.set_input_shape('speech_lengths', (batch_size,))
    context.set_input_shape('language', (batch_size,))
    context.set_input_shape('textnorm', (batch_size,))

    bindings = [speech.data_ptr(), speech_lengths.data_ptr(), language.data_ptr(), textnorm.data_ptr(), ctc_logits.data_ptr(), encoder_out_lens.data_ptr()]
    for i in range(len(bindings)):
        print("name:", engine.get_tensor_name(i))
        context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
    handle = torch.cuda.current_stream().cuda_stream
    context.execute_async_v3(stream_handle=handle)

    print('all_binding_shapes_specified: ', context.all_binding_shapes_specified)

    print('ctc_logits shape: ', context.get_tensor_shape('ctc_logits'))
    print('encoder_out_lens', context.get_tensor_shape('encoder_out_lens'))
    print('ctc_logits: ', ctc_logits)
    print('encoder_out_lens: ', encoder_out_lens)

error:

Jan 04 '25 05:01 wjj19950828

@wjj19950828

modify

speech_lengths = torch.Tensor(np.fromfile('speech_lengths.bin', dtype=np.int32).reshape(batch_size,)).cuda()
language = torch.Tensor(np.fromfile('language.bin', dtype=np.int32).reshape(batch_size,)).cuda()
textnorm = torch.Tensor(np.fromfile('textnorm.bin', dtype=np.int32).reshape(batch_size,)).cuda()

to


speech_lengths = torch.tensor(np.fromfile('speech_lengths.bin', dtype=np.int32)).reshape(batch_size,).cuda()
language = torch.tensor(np.fromfile('language.bin', dtype=np.int32)).reshape(batch_size,).cuda()
textnorm = torch.tensor(np.fromfile('textnorm.bin', dtype=np.int32)).reshape(batch_size,).cuda()

Jan 09 '25 11:01 lix19937

请问你用tensorrt 推理成功了吗 @wjj19950828

Jun 16 '25 03:06 feifeiwei