TensorRT ❓[Question] The only valid use of a module is looking up an attribute but found...

❓ Question

Hello, I have a torch scripted model that I am trying to compile with TensorRT:

import cv2
import numpy as np
import torch
from torchvision.transforms import ToTensor
import torch_tensorrt

if __name__ == "__main__":
    # Load the pre-trained model
    model = torch.jit.load('model.jit')

    # Define sample points and bounding box labels
    pts_sampled = np.array([[100, 100], [800, 800]])
    bbox = torch.reshape(torch.tensor(pts_sampled), [1, 1, 2, 2])
    bbox_labels = torch.reshape(torch.tensor([2, 3]), [1, 1, 2])

    # Read and preprocess the image
    image = cv2.imread('image.jpg')
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    img_tensor = ToTensor()(image)

    # Compile the model with TensorRT
    with torch_tensorrt.logging.debug():
        trt_model = torch_tensorrt.compile(model, 
            inputs=[img_tensor[None, ...].cuda(),
                    bbox.cuda(),
                    bbox_labels.cuda()],
            enabled_precisions={torch.float32},
            workspace_size=2000000000,
            truncate_long_and_double=True
        )

This returns the following debug information and error:

INFO: [Torch-TensorRT] - ir was set to default, using TorchScript as ir
DEBUG: [Torch-TensorRT] - TensorRT Compile Spec: {
    "Inputs": [
Input(shape=(1,3,1080,1920,), dtype=Float, format=Contiguous/Linear/NCHW, tensor_domain=[0, 2))Input(shape=(1,1,2,2,), dtype=Long, format=Contiguous/Linear/NCHW, tensor_domain=[0, 2))Input(shape=(1,1,2,), dtype=Long, format=Contiguous/Linear/NCHW, tensor_domain=[0, 2))    ]
    "Enabled Precision": [Float, ]
    "TF32 Disabled": 0
    "Sparsity": 0
    "Refit": 0
    "Debug": 0
    "Device":  {
        "device_type": GPU
        "allow_gpu_fallback": False
        "gpu_id": 0
        "dla_core": -1
    }

    "Engine Capability": Default
    "Num Avg Timing Iters": 1
    "Workspace Size": 2000000000
    "DLA SRAM Size": 1048576
    "DLA Local DRAM Size": 1073741824
    "DLA Global DRAM Size": 536870912
    "Truncate long and double": 1
    "Allow Shape tensors": 0
    "Torch Fallback":  {
        "enabled": True
        "min_block_size": 3
        "forced_fallback_operators": [
        ]
        "forced_fallback_modules": [
        ]
    }
}
DEBUG: [Torch-TensorRT] - init_compile_spec with input vector
DEBUG: [Torch-TensorRT] - Settings requested for Lowering:
    torch_executed_modules: [
    ]
Traceback (most recent call last):
  File "/home/jupyter/main.py", line 79, in <module>
    trt_model = torch_tensorrt.compile(model, 
  File "/home/jupyter/venv/lib/python3.9/site-packages/torch_tensorrt/_compile.py", line 133, in compile
    return torch_tensorrt.ts.compile(
  File "/home/jupyter/venv/lib/python3.9/site-packages/torch_tensorrt/ts/_compiler.py", line 139, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: 
temporary: the only valid use of a module is looking up an attribute but found  = prim::SetAttr[name="W"](%self.1, %345)

Looking to understand what my options are and what I can change to successfully compile.

Environment

PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.27.9
Libc version: glibc-2.31

Python version: 3.9.2 (default, Feb 28 2021, 17:03:44)  [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-5.10.0-26-cloud-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA L4
Nvidia driver version: 525.105.17
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      46 bits physical, 48 bits virtual
CPU(s):                             8
On-line CPU(s) list:                0-7
Thread(s) per core:                 2
Core(s) per socket:                 4
Socket(s):                          1
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping:                           7
CPU MHz:                            2200.222
BogoMIPS:                           4400.44
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          128 KiB
L1i cache:                          128 KiB
L2 cache:                           4 MiB
L3 cache:                           38.5 MiB
NUMA node0 CPU(s):                  0-7
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:             Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.26.2
[pip3] torch==2.0.1
[pip3] torch-tensorrt==1.4.0
[pip3] torchvision==0.15.2
[pip3] triton==2.0.0

Dec 08 '23 23:12 edmuthiah

Hi - it looks like the model code itself is setting attributes of the nn.Module, which causes issues for some of our TorchScript lowering passes. If this is a detection-style model, the attribute-setting has been a challenge with the TorchScript path in the past. The new torch_tensorrt torch.compile backend can help with many of these issues. It requires as input an nn.Module, and it is recommended to use the latest nightly version of torch_tensorrt and torch, as in: pip install --pre torch torch_tensorrt --extra-index-url https://download.pytorch.org/whl/nightly/cu118.

Regarding the TorchScript path, if you could run the compilation with with torch_tensorrt.logging.graphs(): and share the output, it would be helpful in determining which lowering pass resulted in the error.

Dec 12 '23 20:12 gs-olive

Thanks for your support with this @gs-olive. I've updated to the nightly version and it produces the following outputs. Please see env_log.txt and compile_log.txt.

compile_log.txt env_log.txt

Also note a new error around CUPTI initialization.

Dec 13 '23 06:12 edmuthiah

Thanks for the follow-up. The SetAttr issue appears to be thrown here, in the torch::jit::LowerGraph function from PyTorch. https://github.com/pytorch/TensorRT/blob/20264a3c03065fce089cf284b5e172c50cc3bc14/core/lowering/lowering.cpp#L181 From the model code, it appears some attribute self.W integer parameter is being updated, as here:

  %1309 : int = aten::__getitem__(%1303, %self.mask_decoder.num_multimask_outputs)
  %1310 : Tensor = aten::tensor(%1309, %self.image_encoder.neck.0.bias, %self.image_encoder.neck.0.bias, %self.mask_decoder.transformer.layers.0.self_attn.q_dropout.training)
  %1311 : Scalar = aten::item(%1310)
  %1312 : int = aten::Int(%1311)
   = prim::SetAttr[name="W"](%self.1, %1312)

The Torch JIT lower_graph function likely does not support this model code syntax. If you have access to the nn.Module version of the model, it may be useful to try torch_tensorrt.compile(..., ir="dynamo", ...), which does not use TorchScript for tracing/lowering, but still enables serialization via TorchScript.

Dec 15 '23 22:12 gs-olive

Appreciate your continued support with this. Feels like I'm making a little more progress now. For context this is the repo that I'm working with: https://github.com/yformer/EfficientSAM

After setting ir="dynamo" the most recent error I'm getting now is:

  File "/home/jupyter/.venv/lib/python3.9/site-packages/torch/_dynamo/exc.py", line 193, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: boolean masking setitem backwards, see https://github.com/pytorch/pytorch/issues/114123

from user code:
   File "/home/jupyter/EfficientSAM/efficient_sam/efficient_sam.py", line 212, in forward
    return self.predict_masks(
  File "/home/jupyter/EfficientSAM/efficient_sam/efficient_sam.py", line 104, in predict_masks
    sparse_embeddings = self.prompt_encoder(
  File "/home/jupyter/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jupyter/EfficientSAM/efficient_sam/efficient_sam_decoder.py", line 89, in forward
    return self._embed_points(coords, labels)
  File "/home/jupyter/EfficientSAM/efficient_sam/efficient_sam_decoder.py", line 65, in _embed_points
    point_embedding[labels == -1] += self.invalid_points.weight

As per these issues: https://github.com/pytorch/pytorch/issues/114123, https://github.com/pytorch/pytorch/issues/114220, https://github.com/pytorch/pytorch/issues/102841, I've now experimented with a params like capture_dynamic_output_ops and dynamic with no luck.

My code to compile with the previously mentioned nightly build is:

"""
export_to_tensorrt.py
"""
from efficient_sam.build_efficient_sam import build_efficient_sam_vits
from PIL import Image
from torchvision import transforms
import torch
import numpy as np
import torch_tensorrt

if __name__ == "__main__":

    model = build_efficient_sam_vits()
    model.to(torch.device("cuda"))
    
    sample_image_np = np.array(Image.open("figs/examples/dogs.jpg"))
    sample_image_tensor = transforms.ToTensor()(sample_image_np).to(torch.device("cuda"))

    input_points = torch.tensor([[[[580, 350], [650, 350]]]]).to(torch.device("cuda"))
    input_labels = torch.tensor([[[1, 1]]]).to(torch.device("cuda"))

    
    print('Running single inference for testing')
    predicted_logits, predicted_iou = model(
        sample_image_tensor[None, ...],
        input_points,
        input_labels,
    )
    mask = torch.ge(predicted_logits[0, 0, 0, :, :], 0).cpu().detach().numpy()
    masked_image_np = sample_image_np.copy().astype(np.uint8) * mask[:,:,None]
    Image.fromarray(masked_image_np).save(f"figs/examples/dogs_mask.png")
    print('Test image saved successfully')
    
    print('Compiling model...')
    with torch_tensorrt.logging.graphs():
        trt_model = torch_tensorrt.compile(model, 
            inputs= [sample_image_tensor[None, ...],
                     input_points,
                     input_labels],
            enabled_precisions= {torch.float32},
            workspace_size=2000000000,
            capture_dynamic_output_shape_ops=True,
            ir="dynamo",
        ) #dynamic=False
        print('Successfully compiled model', trt_model)

Dec 16 '23 02:12 edmuthiah

Thank you for testing this out - it looks like an issue is encountered when tracing the model in the Dynamo path. Could you also check ir="torch_compile" and see if there are any reported errors?

Dec 26 '23 18:12 gs-olive