TensorRT ✨[Feature] Will torch-TensorRT plan to support runtime subgraph optimization like TFTRT?

Currently, Torch-TensorRT can't optimize two-stage models like faster-rcnn, which get detection boxes firstly and then predict label for each box ; Because the number of detection boxes can only get at runtime, but torch-TensorRT need shape info to optimize classify subgraph by AOT.

Torch-TensorRT current fabllback-mode also can't solve it because of AOT optimiztion.

So will torch-TensorRT plan to support runtime subgraph optimization like TFTRT? Like TFTRT:

Extracting torch-TensorRT supported subgraph at AOT stage;
And create tensorRT engine plan at runtime when getting runtime shape info.
To improve efficiency , engine cache binding with shape info is needed.
Others like fallback execution is needed too.

Addtional info:

torchscript provides custom FusionGroup utilities in torch/csrc/jit/passes/graph_fuser.cpp, so subgraph extraction maybe not too hard;
And implementing fallback execution by reference to prim::FusionGroup.

Feb 08 '22 10:02 gssplayer

@borisfom Is this related to that multi-engine usecase you were talking about?

Feb 11 '22 16:02 narendasan

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

May 13 '22 00:05 github-actions[bot]

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Aug 16 '22 00:08 github-actions[bot]

This is a constraint on data dependent shapes (DDS), currently slated for v1.4 end of year.

Aug 22 '22 17:08 ncomly-nvidia

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Nov 21 '22 00:11 github-actions[bot]

DDS without fallback is supported in Torch-TRT v1.3. Please v1.3 if your model is supported end to end. If not, DDS with fallback is tentative support for v1.4 Q1'23.

Jan 03 '23 21:01 ncomly-nvidia