Unstable inference time
Hi!
I run yolo11 model inference for 1000 times in Tesla T4, but I found the time cost was very unstable. From the cached records, I found most of the time cost was even and looked normal, but the normal ones were always interleaved with a few abnormal ones. For example, of the 1000 times inference records, most of the inference time cost was 2ms per image, but there were a few that cost 70ms per image.
I had tried to set a fixed card frequency but it didnt seem to work.
So can you help me with that? Thanks!
Nobody here?
Use the same inputs or different inputs ? Or you can use trtexec to profile it .
Use the same inputs or different inputs ? Or you can use trtexec to profile it .
The same inputs. The behaviour of my inference code is different from the profile of trtexec. And the difference only lies in the several abnormal long inference time while others are the same.
Set warmup, and data copy not include the latency. Lock the gpu freq.
Set warmup, and data copy not include the latency. Lock the gpu freq.
The warm-up phase had already been integrated into the process. Moreover, the GPU frequency had been set to its maximum level. And data copy latency was not taken into consideration. Only the time of enqueueV3 was recorded, yet these times were unstable.
What OS are you using and what's the vram usage during inference? Is it running too close to the vram limit?
What OS are you using and what's the vram usage during inference? Is it running too close to the vram limit?
Thanks for your reply.
No, even testing a single simple model will also cause unstable inference time. My OS was ubuntu 22.04.
Does the behavior exist in the latest version of TRT? If so please feel free to share the model and your inference statistics. Also, is it only happening on Tesla T4?