❓[Question] The only valid use of a module is looking up an attribute but found...
❓ Question
Hello, I have a torch scripted model that I am trying to compile with TensorRT:
import cv2
import numpy as np
import torch
from torchvision.transforms import ToTensor
import torch_tensorrt
if __name__ == "__main__":
# Load the pre-trained model
model = torch.jit.load('model.jit')
# Define sample points and bounding box labels
pts_sampled = np.array([[100, 100], [800, 800]])
bbox = torch.reshape(torch.tensor(pts_sampled), [1, 1, 2, 2])
bbox_labels = torch.reshape(torch.tensor([2, 3]), [1, 1, 2])
# Read and preprocess the image
image = cv2.imread('image.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
img_tensor = ToTensor()(image)
# Compile the model with TensorRT
with torch_tensorrt.logging.debug():
trt_model = torch_tensorrt.compile(model,
inputs=[img_tensor[None, ...].cuda(),
bbox.cuda(),
bbox_labels.cuda()],
enabled_precisions={torch.float32},
workspace_size=2000000000,
truncate_long_and_double=True
)
This returns the following debug information and error:
INFO: [Torch-TensorRT] - ir was set to default, using TorchScript as ir
DEBUG: [Torch-TensorRT] - TensorRT Compile Spec: {
"Inputs": [
Input(shape=(1,3,1080,1920,), dtype=Float, format=Contiguous/Linear/NCHW, tensor_domain=[0, 2))Input(shape=(1,1,2,2,), dtype=Long, format=Contiguous/Linear/NCHW, tensor_domain=[0, 2))Input(shape=(1,1,2,), dtype=Long, format=Contiguous/Linear/NCHW, tensor_domain=[0, 2)) ]
"Enabled Precision": [Float, ]
"TF32 Disabled": 0
"Sparsity": 0
"Refit": 0
"Debug": 0
"Device": {
"device_type": GPU
"allow_gpu_fallback": False
"gpu_id": 0
"dla_core": -1
}
"Engine Capability": Default
"Num Avg Timing Iters": 1
"Workspace Size": 2000000000
"DLA SRAM Size": 1048576
"DLA Local DRAM Size": 1073741824
"DLA Global DRAM Size": 536870912
"Truncate long and double": 1
"Allow Shape tensors": 0
"Torch Fallback": {
"enabled": True
"min_block_size": 3
"forced_fallback_operators": [
]
"forced_fallback_modules": [
]
}
}
DEBUG: [Torch-TensorRT] - init_compile_spec with input vector
DEBUG: [Torch-TensorRT] - Settings requested for Lowering:
torch_executed_modules: [
]
Traceback (most recent call last):
File "/home/jupyter/main.py", line 79, in <module>
trt_model = torch_tensorrt.compile(model,
File "/home/jupyter/venv/lib/python3.9/site-packages/torch_tensorrt/_compile.py", line 133, in compile
return torch_tensorrt.ts.compile(
File "/home/jupyter/venv/lib/python3.9/site-packages/torch_tensorrt/ts/_compiler.py", line 139, in compile
compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError:
temporary: the only valid use of a module is looking up an attribute but found = prim::SetAttr[name="W"](%self.1, %345)
Looking to understand what my options are and what I can change to successfully compile.
Environment
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.27.9
Libc version: glibc-2.31
Python version: 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-5.10.0-26-cloud-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA L4
Nvidia driver version: 525.105.17
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 7
CPU MHz: 2200.222
BogoMIPS: 4400.44
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 4 MiB
L3 cache: 38.5 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
Versions of relevant libraries:
[pip3] numpy==1.26.2
[pip3] torch==2.0.1
[pip3] torch-tensorrt==1.4.0
[pip3] torchvision==0.15.2
[pip3] triton==2.0.0
Hi - it looks like the model code itself is setting attributes of the nn.Module, which causes issues for some of our TorchScript lowering passes. If this is a detection-style model, the attribute-setting has been a challenge with the TorchScript path in the past. The new torch_tensorrt torch.compile backend can help with many of these issues. It requires as input an nn.Module, and it is recommended to use the latest nightly version of torch_tensorrt and torch, as in: pip install --pre torch torch_tensorrt --extra-index-url https://download.pytorch.org/whl/nightly/cu118.
Regarding the TorchScript path, if you could run the compilation with with torch_tensorrt.logging.graphs(): and share the output, it would be helpful in determining which lowering pass resulted in the error.
Thanks for your support with this @gs-olive. I've updated to the nightly version and it produces the following outputs. Please see env_log.txt and compile_log.txt.
Also note a new error around CUPTI initialization.
Thanks for the follow-up. The SetAttr issue appears to be thrown here, in the torch::jit::LowerGraph function from PyTorch.
https://github.com/pytorch/TensorRT/blob/20264a3c03065fce089cf284b5e172c50cc3bc14/core/lowering/lowering.cpp#L181
From the model code, it appears some attribute self.W integer parameter is being updated, as here:
%1309 : int = aten::__getitem__(%1303, %self.mask_decoder.num_multimask_outputs)
%1310 : Tensor = aten::tensor(%1309, %self.image_encoder.neck.0.bias, %self.image_encoder.neck.0.bias, %self.mask_decoder.transformer.layers.0.self_attn.q_dropout.training)
%1311 : Scalar = aten::item(%1310)
%1312 : int = aten::Int(%1311)
= prim::SetAttr[name="W"](%self.1, %1312)
The Torch JIT lower_graph function likely does not support this model code syntax. If you have access to the nn.Module version of the model, it may be useful to try torch_tensorrt.compile(..., ir="dynamo", ...), which does not use TorchScript for tracing/lowering, but still enables serialization via TorchScript.
Appreciate your continued support with this. Feels like I'm making a little more progress now. For context this is the repo that I'm working with: https://github.com/yformer/EfficientSAM
After setting ir="dynamo" the most recent error I'm getting now is:
File "/home/jupyter/.venv/lib/python3.9/site-packages/torch/_dynamo/exc.py", line 193, in unimplemented
raise Unsupported(msg)
torch._dynamo.exc.Unsupported: boolean masking setitem backwards, see https://github.com/pytorch/pytorch/issues/114123
from user code:
File "/home/jupyter/EfficientSAM/efficient_sam/efficient_sam.py", line 212, in forward
return self.predict_masks(
File "/home/jupyter/EfficientSAM/efficient_sam/efficient_sam.py", line 104, in predict_masks
sparse_embeddings = self.prompt_encoder(
File "/home/jupyter/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jupyter/EfficientSAM/efficient_sam/efficient_sam_decoder.py", line 89, in forward
return self._embed_points(coords, labels)
File "/home/jupyter/EfficientSAM/efficient_sam/efficient_sam_decoder.py", line 65, in _embed_points
point_embedding[labels == -1] += self.invalid_points.weight
As per these issues: https://github.com/pytorch/pytorch/issues/114123, https://github.com/pytorch/pytorch/issues/114220, https://github.com/pytorch/pytorch/issues/102841, I've now experimented with a params like capture_dynamic_output_ops and dynamic with no luck.
My code to compile with the previously mentioned nightly build is:
"""
export_to_tensorrt.py
"""
from efficient_sam.build_efficient_sam import build_efficient_sam_vits
from PIL import Image
from torchvision import transforms
import torch
import numpy as np
import torch_tensorrt
if __name__ == "__main__":
model = build_efficient_sam_vits()
model.to(torch.device("cuda"))
sample_image_np = np.array(Image.open("figs/examples/dogs.jpg"))
sample_image_tensor = transforms.ToTensor()(sample_image_np).to(torch.device("cuda"))
input_points = torch.tensor([[[[580, 350], [650, 350]]]]).to(torch.device("cuda"))
input_labels = torch.tensor([[[1, 1]]]).to(torch.device("cuda"))
print('Running single inference for testing')
predicted_logits, predicted_iou = model(
sample_image_tensor[None, ...],
input_points,
input_labels,
)
mask = torch.ge(predicted_logits[0, 0, 0, :, :], 0).cpu().detach().numpy()
masked_image_np = sample_image_np.copy().astype(np.uint8) * mask[:,:,None]
Image.fromarray(masked_image_np).save(f"figs/examples/dogs_mask.png")
print('Test image saved successfully')
print('Compiling model...')
with torch_tensorrt.logging.graphs():
trt_model = torch_tensorrt.compile(model,
inputs= [sample_image_tensor[None, ...],
input_points,
input_labels],
enabled_precisions= {torch.float32},
workspace_size=2000000000,
capture_dynamic_output_shape_ops=True,
ir="dynamo",
) #dynamic=False
print('Successfully compiled model', trt_model)
Thank you for testing this out - it looks like an issue is encountered when tracing the model in the Dynamo path. Could you also check ir="torch_compile" and see if there are any reported errors?