❓ [Question] torch_tensorrt.dynamo.compile hangs indefinitely mid compilation?
❓ Question
torch_tensorrt.dynamo.compile hangs indefinitely mid compilation cpu usage is through the roof and having debug = True shows that there's a step where it fails
What you have already tried
I tried compiling with torchscript and it works well enough but i wanted to test the dynamo backend
Environment
Python 3.9.2 torch 2.2+cu118 torch_tensorrt 2.2+cu118 tensorrt 8.6
Build information about Torch-TensorRT can be found by turning on debug messages
- PyTorch Version (e.g., 1.0): 2.2
- CPU Architecture: x86_64
- OS (e.g., Linux): debian 11
- How you installed PyTorch (
conda,pip,libtorch, source): pip install torch torchvision torch_tensorrt --index-url https://download.pytorch.org/whl/cu118 - Build command you used (if compiling from source):
import torch
import torch_tensorrt
from gfpgan.archs.gfpganv1_clean_arch import GFPGANv1Clean
gfpgan = GFPGANv1Clean(
out_size=512,
num_style_feat=512,
channel_multiplier=2,
decoder_load_path=None,
fix_decoder=False,
num_mlp=8,
input_is_latent=True,
different_w=True,
narrow=1,
sft_half=True)
model_path="./experiments/pretrained_models/GFPGANv1.3.pth"
loadnet = torch.load(model_path)
if 'params_ema' in loadnet:
keyname = 'params_ema'
else:
keyname = 'params'
gfpgan.load_state_dict(loadnet[keyname], strict=True)
gfpgan = gfpgan.eval()
inputs=[torch.randn([8, 3, 512, 512],dtype=torch.float32).cuda()]
if torch.cuda.is_available():
gfpgan = gfpgan.cuda().eval()
torch.set_float32_matmul_precision('high')
compiled = torch.compile(gfpgan,
backend="aot_torch_tensorrt_aten",
options={
"truncate_long_and_double":True,
"debug":True
})
print("EXPORTING")
import time
start= time.time()
print(compiled(*inputs))
print(time.time()-start)
torch.save(compiled, "compiled.ts")
- Are you using local sources or building from archives:
- Python version: 3.9.2
- CUDA version: 118 (12.3 installed on OS)
- GPU models and configuration: nvidia A100 80gb and nvidia L4 both have the same behavior
- Any other relevant information: private fork based on https://github.com/TencentARC/GFPGAN
Additional context
Hello - thanks for the report. Could you share the output of debug mode or the line in Torch-TRT where the error occurs? Additionally, I noticed the torch.save(compiled, "compiled.ts") call, but torch.compile models are not serializable at this time. Could you also try backend="eager" to verify compilation in regular torch.compile?
yeah for sure. it never reached the torch.save line so thanks for telling me! is there something i can do to save these compiled models as it is going to take a long time to do it in production. the "error" happens here
compiled = torch.compile(gfpgan,
backend="aot_torch_tensorrt_aten",
options={
"truncate_long_and_double":True,
"debug":True
})
it just never finishes left it on for an hour and the last thing printed into the terminal is here
[02/21/2024-15:55:46] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: Scale Tactic: 0x0000000000000000
[02/21/2024-15:55:46] [TRT] [V] *************** Autotuning format combination: Float(786432,1,1536,3) -> Float(786432,1,1536,3) ***************
[02/21/2024-15:55:46] [TRT] [V] --------------- Timing Runner: add_291_rhs + [ELEMENTWISE]-[aten_ops.add.Tensor]-[add_291] (Scale[0x80000007])
[02/21/2024-15:55:46] [TRT] [V] Scale has no valid tactics for this config, skipping
[02/21/2024-15:55:46] [TRT] [V] *************** Autotuning format combination: Float(262144,1:4,512,1) -> Float(262144,1:4,512,1) ***************
[02/21/2024-15:55:46] [TRT] [V] --------------- Timing Runner: add_291_rhs + [ELEMENTWISE]-[aten_ops.add.Tensor]-[add_291] (Scale[0x80000007])
[02/21/2024-15:55:46] [TRT] [V] Scale has no valid tactics for this config, skipping
[02/21/2024-15:55:46] [TRT] [V] =============== Computing costs for {ForeignNode[index_116_index_sum_intermediate...[ELEMENTWISE]-[aten_ops.add.Tensor]-[add_297]]}
[02/21/2024-15:55:46] [TRT] [V] *************** Autotuning format combination: Float(384,16,4,1), Float(1536,64,8,1), Float(6144,256,16,1), Float(24576,1024,32,1), Float(98304,4096,64,1), Float(393216,16384,128,1), Float(1572864,65536,256,1), Float(786432,262144,512,1) -> Float(786432,262144,512,1) ***************
[02/21/2024-15:55:46] [TRT] [V] --------------- Timing Runner: {ForeignNode[index_116_index_sum_intermediate...[ELEMENTWISE]-[aten_ops.add.Tensor]-[add_297]]} (Myelin[0x80000023])
Full log here log.txt
Can confirm it works with eager backend
For something like [02/21/2024-15:55:46] [TRT] [V] Scale has no valid tactics for this config, skipping increasing workspace size can help.
I leave it to be decided by pytorch, what value do you recommend I set it too?
This is dependent on your specific GPU and how much memory it has, could try starting with 10GB and moving up or down depending on if this gets resolved
I am on an L4 which has 24gb of vram and i set it to 23<<30. I didnt notice if it fixed the no valid tactics thing but my issue is still the same it freezes later at
[02/21/2024-15:55:46] [TRT] [V] --------------- Timing Runner: {ForeignNode[index_116_index_sum_intermediate...[ELEMENTWISE]-[aten_ops.add.Tensor]-[add_297]]} (Myelin[0x80000023])