FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

paraformer onnx-gpu 转 tensorrt 报错 (Could not find any implementation for node)

Open willnufe opened this issue 1 year ago • 3 comments

1. environment

  • OS (e.g., Linux): Linux
  • FunASR Version (e.g., 1.0.0): 1.1.3
  • ModelScope Version (e.g., 1.11.0): 1.11.0
  • GPU (e.g., V100M32): A100

1.1 pt to onnx(predictor的cif部分使用的是cif_v1):

  • onnx-simplifier: 0.4.36
  • PyTorch Version (e.g., 2.0.0): 2.0.1
  • How you installed funasr (pip, source): pip
  • Python version: 3.9.18
  • CUDA/cuDNN version (e.g., cuda11.7): 11.7

1.2 onnx to tensorrt:

  • tensorrt(trtexec): 8.6.1.6
  • CUDA/cuDNN version (e.g., cuda11.7): 11.3

2. problem

使用下面的命令对 paraformer onnx-gpu 模型进行转换,报错

trtexec 
--onnx=/raid/t3cv/wangch/WORK_SAPCE/ASR/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model_sim.onnx 
--saveEngine=/raid/t3cv/wangch/WORK_SAPCE/TEMP/work_space/onnx2tensorrt/models/model.engine  
--minShapes=speech:1x1000x560,speech_lengths:1 
--optShapes=speech:16x1000x560,speech_lengths:16 
--maxShapes=speech:16x1000x560,speech_lengths:16 
--workspace=24576
--verbose  --fp16 --device=7

主要错误是:

Error[10]: Could not find any implementation for node 
{ForeignNode[(Unnamed Layer* 6555) [Constant] + (Unnamed Layer* 6556) [Shuffle].../decoder/decoders/decoders.0/self_attn/Transpose + (Unnamed Layer* 7213) [Shuffle]]}.

image

willnufe avatar Jul 24 '24 07:07 willnufe

@willnufe Need to make some modifications to the code in order to support it successfully. I don't have time recently, but if you are willing to do it, I can give you some suggestions offline.

yuekaizhang avatar Jul 29 '24 02:07 yuekaizhang

@willnufe Need to make some modifications to the code in order to support it successfully. I don't have time recently, but if you are willing to do it, I can give you some suggestions offline.

Thank you very much. I want to make some attempts. Please give me some suggestions.

willnufe avatar Jul 29 '24 03:07 willnufe

@willnufe I think to get the max throughput. We need to first make onnx fp16 paraformer work.

https://github.com/modelscope/FunASR/commit/9a9b474e7de7cc90d2ee124dc8d6c2cfa887c059. This PR used several registered_hook to rescale the torchscript fp32 model to torchscript fp16 model. The first thing is to follow it to calibrate onnx fp32 model.

With onnx fp16, you could expect about 50% throughput improvement comparing with onnx fp32 pipeline. Then let's work on tensorrt export.

Would you mind adding my wechat ykzhang2020?

yuekaizhang avatar Jul 29 '24 03:07 yuekaizhang