TensorRT ❓ [Question] torch_tensorrt.dynamo.compile hangs indefinitely mid compilation?

❓ Question

torch_tensorrt.dynamo.compile hangs indefinitely mid compilation cpu usage is through the roof and having debug = True shows that there's a step where it fails

What you have already tried

I tried compiling with torchscript and it works well enough but i wanted to test the dynamo backend

Environment

Python 3.9.2 torch 2.2+cu118 torch_tensorrt 2.2+cu118 tensorrt 8.6

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version (e.g., 1.0): 2.2
CPU Architecture: x86_64
OS (e.g., Linux): debian 11
How you installed PyTorch (conda, pip, libtorch, source): pip install torch torchvision torch_tensorrt --index-url https://download.pytorch.org/whl/cu118
Build command you used (if compiling from source):

import torch
import torch_tensorrt
from gfpgan.archs.gfpganv1_clean_arch import GFPGANv1Clean

gfpgan = GFPGANv1Clean(
                out_size=512,
                num_style_feat=512,
                channel_multiplier=2,
                decoder_load_path=None,
                fix_decoder=False,
                num_mlp=8,
                input_is_latent=True,
                different_w=True,
                narrow=1,
                sft_half=True)

model_path="./experiments/pretrained_models/GFPGANv1.3.pth"
loadnet = torch.load(model_path)
if 'params_ema' in loadnet:
    keyname = 'params_ema'
else:
    keyname = 'params'
gfpgan.load_state_dict(loadnet[keyname], strict=True)
gfpgan = gfpgan.eval()
inputs=[torch.randn([8, 3, 512, 512],dtype=torch.float32).cuda()]

if torch.cuda.is_available():
    gfpgan = gfpgan.cuda().eval()
    torch.set_float32_matmul_precision('high')
    compiled = torch.compile(gfpgan,
                            backend="aot_torch_tensorrt_aten",
                            options={
                                "truncate_long_and_double":True,
                                "debug":True
                            })
    print("EXPORTING")
    import time
    start= time.time()
    print(compiled(*inputs))
    print(time.time()-start)
    torch.save(compiled, "compiled.ts")

Are you using local sources or building from archives:
Python version: 3.9.2
CUDA version: 118 (12.3 installed on OS)
GPU models and configuration: nvidia A100 80gb and nvidia L4 both have the same behavior
Any other relevant information: private fork based on https://github.com/TencentARC/GFPGAN

Additional context

Feb 21 '24 16:02 Antonyesk601

Hello - thanks for the report. Could you share the output of debug mode or the line in Torch-TRT where the error occurs? Additionally, I noticed the torch.save(compiled, "compiled.ts") call, but torch.compile models are not serializable at this time. Could you also try backend="eager" to verify compilation in regular torch.compile?

Feb 22 '24 03:02 gs-olive

yeah for sure. it never reached the torch.save line so thanks for telling me! is there something i can do to save these compiled models as it is going to take a long time to do it in production. the "error" happens here

    compiled = torch.compile(gfpgan,
                            backend="aot_torch_tensorrt_aten",
                            options={
                                "truncate_long_and_double":True,
                                "debug":True
                            })

it just never finishes left it on for an hour and the last thing printed into the terminal is here


[02/21/2024-15:55:46] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: Scale Tactic: 0x0000000000000000
[02/21/2024-15:55:46] [TRT] [V] *************** Autotuning format combination: Float(786432,1,1536,3) -> Float(786432,1,1536,3) ***************
[02/21/2024-15:55:46] [TRT] [V] --------------- Timing Runner: add_291_rhs + [ELEMENTWISE]-[aten_ops.add.Tensor]-[add_291] (Scale[0x80000007])
[02/21/2024-15:55:46] [TRT] [V] Scale has no valid tactics for this config, skipping
[02/21/2024-15:55:46] [TRT] [V] *************** Autotuning format combination: Float(262144,1:4,512,1) -> Float(262144,1:4,512,1) ***************
[02/21/2024-15:55:46] [TRT] [V] --------------- Timing Runner: add_291_rhs + [ELEMENTWISE]-[aten_ops.add.Tensor]-[add_291] (Scale[0x80000007])
[02/21/2024-15:55:46] [TRT] [V] Scale has no valid tactics for this config, skipping
[02/21/2024-15:55:46] [TRT] [V] =============== Computing costs for {ForeignNode[index_116_index_sum_intermediate...[ELEMENTWISE]-[aten_ops.add.Tensor]-[add_297]]}
[02/21/2024-15:55:46] [TRT] [V] *************** Autotuning format combination: Float(384,16,4,1), Float(1536,64,8,1), Float(6144,256,16,1), Float(24576,1024,32,1), Float(98304,4096,64,1), Float(393216,16384,128,1), Float(1572864,65536,256,1), Float(786432,262144,512,1) -> Float(786432,262144,512,1) ***************
[02/21/2024-15:55:46] [TRT] [V] --------------- Timing Runner: {ForeignNode[index_116_index_sum_intermediate...[ELEMENTWISE]-[aten_ops.add.Tensor]-[add_297]]} (Myelin[0x80000023])

Feb 22 '24 11:02 Antonyesk601

Full log here log.txt

Feb 22 '24 11:02 Antonyesk601

Can confirm it works with eager backend

Feb 22 '24 11:02 Antonyesk601

For something like [02/21/2024-15:55:46] [TRT] [V] Scale has no valid tactics for this config, skipping increasing workspace size can help.

Feb 23 '24 02:02 narendasan

I leave it to be decided by pytorch, what value do you recommend I set it too?

Feb 24 '24 12:02 Antonyesk601

This is dependent on your specific GPU and how much memory it has, could try starting with 10GB and moving up or down depending on if this gets resolved

Feb 26 '24 17:02 narendasan

I am on an L4 which has 24gb of vram and i set it to 23<<30. I didnt notice if it fixed the no valid tactics thing but my issue is still the same it freezes later at

[02/21/2024-15:55:46] [TRT] [V] --------------- Timing Runner: {ForeignNode[index_116_index_sum_intermediate...[ELEMENTWISE]-[aten_ops.add.Tensor]-[add_297]]} (Myelin[0x80000023])

Feb 26 '24 18:02 Antonyesk601