Torch-TensorRT Integration
🚀 The feature
pytorch-tensorrt hit release 1.0 (is actually 1.1 right now), but most of the models available are not out of the box convertable to it.
Motivation, pitch
Read feature description
Alternatives
Alternatives would be to convert to onnx, and then convert to tensorrt, which is exactly what torch-trt tries to avoid; This would also require more work, because somebody would have to make sure that models are onnx compatible and tensort-trt compatible, with the latter being a torchscript-supporting runtime.
Additional context
AFAIK, from some quick tests like:
module = fcos_resnet50_fpn()
generalized_rcnn_transform_max_size = module.transform.max_size
inputs = torch_trt.Input(
min_shape=[1, 3, 224, 224],
opt_shape=[1, 3, 1080, 1920],
max_shape=[
1,
3,
generalized_rcnn_transform_max_size,
generalized_rcnn_transform_max_size,
],
)
precisions = {torch.half} # doesn't really matter
trt_module = torch_trt.compile(
module=module, inputs=[inputs], enabled_precisions=precisions
)
And looking into some blocking issue from tensorrt:
... the first thing to do would be to remove self/state (self.a = b) mutations, on the forward method, and... almost everything will work out of the box?
For reference, from a quick attempt to port some fcos model, I found 2 places where this happens. I am not sure if this is required.
@ntakouris Thanks a lot for the feedback.
I would be happy if we can make the models TensortRT compatible as long as it doesn't involve adding extra hacks and workarounds. An example of such workaround is at https://github.com/pytorch/vision/pull/6120
TorchVision's codebase often has to apply numerous workarounds (most of the nusty) to ensure our models are JIT-scriptable and FX-traceable. Add ONNX to the mix (for limited models) and then you got a code-base that very hard to maintain. Often these workarounds conflict one another, requiring even more hacks to make things work. So as long as we don't introduce more of them, I'm happy to investigate with you making the models compatible. Note though that I have extremely limited knowledge of TensorRT, so if we were to make things compatible that needs to be driven by someone who has good knowledge. In addition, if we are to go that path it should be understood that the chance of not merging the PR is high due to the aforementioned caveats. So if a contributor would be comfortable with this, we could schedule to provide some support on the reviewing side.
Concerning the points on the mutations, I think they are valid. To the best of my knowledge the was an alternative API we could use to warn once but didn't work properly so we had to do this workaround. If there is a better solution, we can consider adopting it. Concerning anchor utils, the situation is more complex because unfortunately the method set_cell_anchors() is public so clipping it would be a BC-breaking change. Thankfully the whole Detection area is Beta which gives us some flexibility. So if we had a PR that rewrites the anchor to avoid having side-effects (which I agree is a bad practice) then we can discuss finding ways to merge the changes. Perhaps we should take the step to deprecate now so that we can change in the next release. cc @NicolasHug to give his input here.
@NicolasHug any input on this?
As far as I understand, the next few major releases of Torch-TRT plan on catching up to the latest pytorch features. They keep adding ops and more support for everything torchscript basically. In practise, it should be easier to start converting models without "specific object detection operations" and "complex inference logic", like DETR.
In the near future, this should be seamless, because most (if not all) torchvision models are already working with torchscript out of the box. There might be a need however, to port some specific parts to TRT (similarly to ONNX), for example, using some specific nms algorithm that has got the same result, but runs more performant on cuda (efficientdet-onnx did that).
@NicolasHug any input on this?
Not much to add to @datumbox's answer, with which I fully agree. My main concern is maintenance cost: supporting the cross-product of torchscript, FX and ONNX induces a very high maintenance cost for us, and adding support to RT would complicate this even further. This is something we might need to discuss a bit more, for us to be certain that the benefits are worth the additional cost.
There should be a fx2trt package coming soon as well. This could make it possible directly exporting a big part of the models in torchvision straight away. It's also supposed to support int8 quantization for trt when it gets open sourced. I think the wise thing to do would be to wait a bit until the aforementioned technologies support a bigger subset of torchscript.
I keep track of those changes and updates on my spare time, will keep this thread updated if any breakthrough happens.
Tuples, Dicts and Lists are now supported.
-
There's still the 2 issues mentioned in the original post, but those are easy to solve, or can be annotated semi-automatically to be ignored
-
There's also this issue I mentioned above
Getting there!