TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

[Bug]After converting sensevoice's onnx to trt via trtexec, an error is reported

Open wjj19950828 opened this issue 1 year ago • 6 comments

Currently, sensevoice's trt engine can be successfully converted through trtexec, but when running the benchmark infer, an error message is displayed as shown below: Image

ORT can be used to successfully predict the corresponding ONNX. The code is as follows, indicating that ONNX is fine

import onnxruntime
import torch

option = onnxruntime.SessionOptions()
option.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
option.intra_op_num_threads = 1
providers = [
    "CUDAExecutionProvider" 
    if torch.cuda.is_available() else "CPUExecutionProvider"
]
model = onnxruntime.InferenceSession(
    "model_sensevoice.onnx",
    sess_options=option, providers=providers)

batch_size = 4
feats_length = 256
speech = torch.randn(batch_size, feats_length, 560).cuda()
speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()

ort_inputs = {
    'speech': speech.cpu().numpy(),
    'speech_lengths': speech_lengths.cpu().numpy(),
    'language': language.cpu().numpy(),
    'textnorm': textnorm.cpu().numpy(),
}
output = model.run(None, ort_inputs)[0]
print("output:", output, output.shape)

trtexec convert:

trtexec \
    --onnx=model_sensevoice.onnx \
    --saveEngine=engine_fp16.plan \
    --minShapes=speech:1x128x560,speech_lengths:1,language:1,textnorm:1 \
    --optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --maxShapes=speech:8x512x560,speech_lengths:8,language:8,textnorm:8 \
    --fp16 \
    --builderOptimizationLevel=3 \
    --memPoolSize=workspace:4096 \
    --verbose

TRT version:TensorRT-10.7.0.23 ONNX version: 1.17.0

So what is the specific reason? Thank you~

wjj19950828 avatar Dec 31 '24 10:12 wjj19950828

Try to use follow


trtexec \
    --onnx=model_sensevoice.onnx \
    --saveEngine=engine_fp16.plan \
    --minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4 \
    --fp16 \
    --builderOptimizationLevel=3 \
    --memPoolSize=workspace:4096 \
    --verbose

lix19937 avatar Jan 01 '25 14:01 lix19937

Try to use follow

trtexec
--onnx=model_sensevoice.onnx
--saveEngine=engine_fp16.plan
--minShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--optShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--maxShapes=speech:4x256x560,speech_lengths:4,language:4,textnorm:4
--fp16
--builderOptimizationLevel=3
--memPoolSize=workspace:4096
--verbose

@lix19937 Thanks for your reply~

Currently, since both bs dimension and feats_length require dynamic shapes, is it reasonable to set them to the same shape?

export onnx through follow:

def export_dynamic_axes(self):
    return {
        "speech": {0: "batch_size", 1: "feats_length"},
        "speech_lengths": {0: "batch_size"},
        "language": {0: "batch_size"},
        "textnorm": {0: "batch_size"},
        "ctc_logits": {0: "batch_size", 1: "logits_length"},
        "encoder_out_lens":  {0: "batch_size"},
    }

So are there any other solutions?

Thanks~

wjj19950828 avatar Jan 02 '25 13:01 wjj19950828

@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input. Can you try providing the same inputs as ORT using --loadInputs flag in trtexec? The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).

asfiyab-nvidia avatar Jan 02 '25 23:01 asfiyab-nvidia

@wjj19950828 does the speech_length input need to have its last dimension be feats_length? If yes, the trtexec command you have provided sets random values to the speech_length input. Can you try providing the same inputs as ORT using --loadInputs flag in trtexec? The --loadInputs flag accepts binary files for each input. You can save a numpy array to a binary file using np_array.tofile(arr.bin).

@asfiyab-nvidia Thanks for your reply!

After using --loadInputs, inference runs benchamark without any problems, as shown below: Image

But when I run TRT using the following script, there is still an error. What is the reason? Thanks~ script:

import tensorrt as trt
import torch
import numpy as np

TRT_LOGGER = trt.Logger(trt.Logger.ERROR)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')

engine_filepath = 'engine_fp16.plan'
# input
batch_size = 4
feats_length = 256
# speech = torch.randn(batch_size, feats_length, 560, dtype=torch.float32).cuda()
# speech_lengths = torch.tensor([6, 30, 31, feats_length], dtype=torch.int32).cuda()
# language = torch.tensor([0, 0, 0, 0], dtype=torch.int32).cuda()
# textnorm = torch.tensor([15, 15, 15, 15], dtype=torch.int32).cuda()
speech = torch.Tensor(np.fromfile('speech.bin', dtype=np.float32).reshape(batch_size, feats_length, 560)).cuda()
speech_lengths = torch.Tensor(np.fromfile('speech_lengths.bin', dtype=np.int32).reshape(batch_size,)).cuda()
language = torch.Tensor(np.fromfile('language.bin', dtype=np.int32).reshape(batch_size,)).cuda()
textnorm = torch.Tensor(np.fromfile('textnorm.bin', dtype=np.int32).reshape(batch_size,)).cuda()
# output
ctc_logits = torch.empty(batch_size, feats_length + 4, 25055, dtype=torch.float32).cuda()
encoder_out_lens = torch.empty(batch_size, dtype=torch.int32).cuda()

with open(engine_filepath, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()
    context.set_input_shape('speech', (batch_size, feats_length, 560))
    context.set_input_shape('speech_lengths', (batch_size,))
    context.set_input_shape('language', (batch_size,))
    context.set_input_shape('textnorm', (batch_size,))

    bindings = [speech.data_ptr(), speech_lengths.data_ptr(), language.data_ptr(), textnorm.data_ptr(), ctc_logits.data_ptr(), encoder_out_lens.data_ptr()]
    for i in range(len(bindings)):
        print("name:", engine.get_tensor_name(i))
        context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
    handle = torch.cuda.current_stream().cuda_stream
    context.execute_async_v3(stream_handle=handle)

    print('all_binding_shapes_specified: ', context.all_binding_shapes_specified)

    print('ctc_logits shape: ', context.get_tensor_shape('ctc_logits'))
    print('encoder_out_lens', context.get_tensor_shape('encoder_out_lens'))
    print('ctc_logits: ', ctc_logits)
    print('encoder_out_lens: ', encoder_out_lens)

error: Image

wjj19950828 avatar Jan 04 '25 05:01 wjj19950828

@wjj19950828

modify

speech_lengths = torch.Tensor(np.fromfile('speech_lengths.bin', dtype=np.int32).reshape(batch_size,)).cuda()
language = torch.Tensor(np.fromfile('language.bin', dtype=np.int32).reshape(batch_size,)).cuda()
textnorm = torch.Tensor(np.fromfile('textnorm.bin', dtype=np.int32).reshape(batch_size,)).cuda()

to


speech_lengths = torch.tensor(np.fromfile('speech_lengths.bin', dtype=np.int32)).reshape(batch_size,).cuda()
language = torch.tensor(np.fromfile('language.bin', dtype=np.int32)).reshape(batch_size,).cuda()
textnorm = torch.tensor(np.fromfile('textnorm.bin', dtype=np.int32)).reshape(batch_size,).cuda()

lix19937 avatar Jan 09 '25 11:01 lix19937

请问你用tensorrt 推理成功了吗 @wjj19950828

feifeiwei avatar Jun 16 '25 03:06 feifeiwei