TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

setmaxworkspace failure of TensorRT 8.5 from onnx to tensorrt. when running transfer on GPU 4060

Open brilliant-soilder opened this issue 1 year ago • 6 comments

The GPU memory size of setmaxworkspace can affect the conversion from onnx to tensorrt. 4096MB sometimes succeeds and sometimes fails, while 1024MB reports an error. What is the reason for this.

B760 for motherboard, then 13th generation CPU, paired with 4060 graphics card

brilliant-soilder avatar May 22 '24 11:05 brilliant-soilder

Hi, can you please share with us verbose logs of the failing cases with trtexec? trtexec --onnx=<your-onnx-file> --verbose

Even better if you can share the failing onnx files and parameters you're passing to build the engine.

brb-nv avatar May 22 '24 21:05 brb-nv

Hi, can you please share with us verbose logs of the failing cases with trtexec? trtexec --onnx=<your-onnx-file> --verbose

Even better if you can share the failing onnx files and parameters you're passing to build the engine.

2

brilliant-soilder avatar May 24 '24 08:05 brilliant-soilder

And before that, the path seems affect the convert-process: On a 4060 graphics card in a Linux system, converting onnx to tensorrt from the path of "/usr/local/", when infers on the same computer it would generate an error message , as shown in the following image. Why is the computing power 8.9 during conversion and 7.5 during inference, while it is the same device. The model converted in the path of "/home/work/" will not report an error. is this due to path permissions or something else?

brilliant-soilder avatar May 24 '24 08:05 brilliant-soilder

And before that, the path seems affect the convert-process: On a 4060 graphics card in a Linux system, converting onnx to tensorrt from the path of "/usr/local/", when infers on the same computer it would generate an error message , as shown in the following image. Why is the computing power 8.9 during conversion and 7.5 during inference, while it is the same device. The model converted in the path of "/home/work/" will not report an error. is this due to path permissions or something else?

Uploading 1.png…

brilliant-soilder avatar May 24 '24 08:05 brilliant-soilder

1

brilliant-soilder avatar May 24 '24 08:05 brilliant-soilder

Sorry, I'm a bit confused what exactly the problem is.

  • If it's the malloc_consolidate(): invalid chunk size, it seems to come from a heap memory management issue [reference], which could likely be a user issue.

  • The second issue happens when engine profile (stored in your plan file) doesn't match your current device's profile. Please refer to the hardware compatibility section and compatibility checks section. Specifically,

TensorRT records in a plan the major, minor, patch and build versions of the library used to create the plan. If these do not match the version of the runtime used to deserialize the plan, it will fail to deserialize.

brb-nv avatar May 24 '24 18:05 brb-nv