nameli0722 comments

Results 5 comments of


                                            nameli0722

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ?

我是通过Ttiny-tensorrt来做量化

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ?

I can't understand what you mean. I used int8 quantization and calibration set, and the inference result is also correct. It's large, but the GPU memory usage is larger than...

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ?

@zerollzeng Thank you very much!

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ?

> > > > Hello, could you please provide the gpu usage and inference speed， with int8 and FP16? thank you! origin pt model: gpu usage 5099MB, inference time 1.7s;...

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ?

> > How about building the engine first and then load the engine, I think it can save some memory. > > Anyway I'll try to improve this. ./tinyexec --onnx...