nameli0722
nameli0722
我是通过Ttiny-tensorrt来做量化
I can't understand what you mean. I used int8 quantization and calibration set, and the inference result is also correct. It's large, but the GPU memory usage is larger than...
@zerollzeng Thank you very much!
> > > > Hello, could you please provide the gpu usage and inference speed, with int8 and FP16? thank you! origin pt model: gpu usage 5099MB, inference time 1.7s;...
> > How about building the engine first and then load the engine, I think it can save some memory. > > Anyway I'll try to improve this. ./tinyexec --onnx...