tensorrt-cpp-api
tensorrt-cpp-api copied to clipboard
FP16 and INT8 inference speed
Hi, thanks for your excellent work, it really helps me in my work. But i have a question here. I modified your code to infer my model. I used FP16 and INT8 precision, both modes have correct inference results. But the FP16 mode and INT8 mode has almost the same inference speed. I wonder why the INT8 mode does not infer faster than FP16 mode? Any sugessions, much thanks.
ps: I used realesrgan-x4v3 model, and i transfer it to onnx format.