๐ [Bug] Encountered bug when using Torch-TensorRT
Bug Description
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)
RuntimeError Traceback (most recent call last)
Input In [11], in <cell line: 21>()
18 new_level = torch_tensorrt.logging.Level.Error
19 torch_tensorrt.logging.set_reportable_log_level(new_level)
---> 21 trt_model = torch_tensorrt.compile(traced_mlm_model,
22 inputs= [torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32), # input_ids
23 torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32), # token_type_ids
24 torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32)], # attention_mask
25 enabled_precisions= {torch.float32}, # Run with 32-bit precision
26 workspace_size=1000000000,
27 truncate_long_and_double=True
28 )
File /opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py:115, in compile(module, ir, inputs, enabled_precisions, **kwargs)
110 logging.log(
111 logging.Level.Info,
112 "Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript"
113 )
114 ts_mod = torch.jit.script(module)
--> 115 return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
116 elif target_ir == _IRType.fx:
117 raise RuntimeError("fx is currently not supported")
File /opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py:113, in compile(module, inputs, device, disable_tf32, sparse_weights, enabled_precisions, refit, debug, capability, num_min_timing_iters, num_avg_timing_iters, workspace_size, calibrator, truncate_long_and_double, require_full_compilation, min_block_size, torch_executed_ops, torch_executed_modules)
87 raise ValueError(
88 "require_full_compilation is enabled however the list of modules and ops to run in torch is not empty. Found: torch_executed_ops: "
89 + torch_executed_ops + ", torch_executed_modules: " + torch_executed_modules)
91 spec = {
92 "inputs": inputs,
93 "device": device,
(...)
110 }
111 }
--> 113 compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
114 compiled_module = torch.jit._recursive.wrap_cpp_module(compiled_cpp_mod)
115 return compiled_module
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py(2148): embedding
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/sparse.py(158): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1117): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1129): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py(235): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1117): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1129): _call_impl
/opt/conda/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py(1010): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1117): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1129): _call_impl
/opt/pytorch/torch_tensorrt/notebooks/segment_model/segment.py(481): forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1117): _slow_forward
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py(1129): _call_impl
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(958): trace_module
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py(741): trace
/tmp/ipykernel_2946/2700711854.py(16): <cell line: 16>
/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py(3397): run_code
/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py(3337): run_ast_nodes
/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py(3134): run_cell_async
/opt/conda/lib/python3.8/site-packages/IPython/core/async_helpers.py(129): _pseudo_sync_runner
/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py(2935): _run_cell
/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py(2880): run_cell
/opt/conda/lib/python3.8/site-packages/ipykernel/zmqshell.py(528): run_cell
/opt/conda/lib/python3.8/site-packages/ipykernel/ipkernel.py(383): do_execute
/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py(724): execute_request
/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py(400): dispatch_shell
/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py(493): process_one
/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py(504): dispatch_queue
/opt/conda/lib/python3.8/asyncio/events.py(81): _run
/opt/conda/lib/python3.8/asyncio/base_events.py(1859): _run_once
/opt/conda/lib/python3.8/asyncio/base_events.py(570): run_forever
/opt/conda/lib/python3.8/site-packages/tornado/platform/asyncio.py(199): start
/opt/conda/lib/python3.8/site-packages/ipykernel/kernelapp.py(712): start
/opt/conda/lib/python3.8/site-packages/traitlets/config/application.py(846): launch_instance
/opt/conda/lib/python3.8/site-packages/ipykernel_launcher.py(17): <module>
/opt/conda/lib/python3.8/runpy.py(87): _run_code
/opt/conda/lib/python3.8/runpy.py(194): _run_module_as_main
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)
Code
when I follow the steps in this tutuial to process my model, I encountered the bug above. I did not move any data or model to GPU, I wonder why cuda is found is compiling. Here is my code:
import torch
import timeit
import numpy as np
import torch_tensorrt
import torch.backends.cudnn as cudnn
from transformers import BertConfig, BertTokenizerFast
from segment_model.segment import SegmentationFontEndBertModel
batch_size = 4
batched_input_ids = torch.tensor([[[1]*24 for _ in range(47)]]*batch_size)
batched_attention_masks = torch.tensor([[[0]*24 for _ in range(47)]]*batch_size)
batched_token_type_ids = torch.tensor([[[1]*24 for _ in range(47)]]*batch_size)
# print(batched_input_ids)
mlm_model_ts = SegmentationFontEndBertModel.from_pretrained("20220601_1860", torchscript=True)
traced_mlm_model = torch.jit.trace(mlm_model_ts, [batched_input_ids, batched_attention_masks, batched_token_type_ids], strict=False)
new_level = torch_tensorrt.logging.Level.Error
torch_tensorrt.logging.set_reportable_log_level(new_level)
trt_model = torch_tensorrt.compile(traced_mlm_model,
inputs= [torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32), # input_ids
torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32), # token_type_ids
torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32)], # attention_mask
enabled_precisions= {torch.float32}, # Run with 32-bit precision
workspace_size=1000000000,
truncate_long_and_double=True
)
I run this code in PyTorch Docker Container which contains PyTorch and Torch-TensorRT. nvcr.io/nvidia/pytorch:22.05-py3,
here is my model structure:
SegmentationFontEndBertModel(
original_name=SegmentationFontEndBertModel
(bert): BertModel(
original_name=BertModel
(embeddings): BertEmbeddings(
original_name=BertEmbeddings
(word_embeddings): Embedding(original_name=Embedding)
(position_embeddings): Embedding(original_name=Embedding)
(token_type_embeddings): Embedding(original_name=Embedding)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
(encoder): BertEncoder(
original_name=BertEncoder
(layer): ModuleList(
original_name=ModuleList
(0): BertLayer(
original_name=BertLayer
(attention): BertAttention(
original_name=BertAttention
(self): BertSelfAttention(
original_name=BertSelfAttention
(query): Linear(original_name=Linear)
(key): Linear(original_name=Linear)
(value): Linear(original_name=Linear)
(dropout): Dropout(original_name=Dropout)
)
(output): BertSelfOutput(
original_name=BertSelfOutput
(dense): Linear(original_name=Linear)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
)
(intermediate): BertIntermediate(
original_name=BertIntermediate
(dense): Linear(original_name=Linear)
(intermediate_act_fn): GELUActivation(original_name=GELUActivation)
)
(output): BertOutput(
original_name=BertOutput
(dense): Linear(original_name=Linear)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
)
(1): BertLayer(
original_name=BertLayer
(attention): BertAttention(
original_name=BertAttention
(self): BertSelfAttention(
original_name=BertSelfAttention
(query): Linear(original_name=Linear)
(key): Linear(original_name=Linear)
(value): Linear(original_name=Linear)
(dropout): Dropout(original_name=Dropout)
)
(output): BertSelfOutput(
original_name=BertSelfOutput
(dense): Linear(original_name=Linear)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
)
(intermediate): BertIntermediate(
original_name=BertIntermediate
(dense): Linear(original_name=Linear)
(intermediate_act_fn): GELUActivation(original_name=GELUActivation)
)
(output): BertOutput(
original_name=BertOutput
(dense): Linear(original_name=Linear)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
)
(2): BertLayer(
original_name=BertLayer
(attention): BertAttention(
original_name=BertAttention
(self): BertSelfAttention(
original_name=BertSelfAttention
(query): Linear(original_name=Linear)
(key): Linear(original_name=Linear)
(value): Linear(original_name=Linear)
(dropout): Dropout(original_name=Dropout)
)
(output): BertSelfOutput(
original_name=BertSelfOutput
(dense): Linear(original_name=Linear)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
)
(intermediate): BertIntermediate(
original_name=BertIntermediate
(dense): Linear(original_name=Linear)
(intermediate_act_fn): GELUActivation(original_name=GELUActivation)
)
(output): BertOutput(
original_name=BertOutput
(dense): Linear(original_name=Linear)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
)
(3): BertLayer(
original_name=BertLayer
(attention): BertAttention(
original_name=BertAttention
(self): BertSelfAttention(
original_name=BertSelfAttention
(query): Linear(original_name=Linear)
(key): Linear(original_name=Linear)
(value): Linear(original_name=Linear)
(dropout): Dropout(original_name=Dropout)
)
(output): BertSelfOutput(
original_name=BertSelfOutput
(dense): Linear(original_name=Linear)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
)
(intermediate): BertIntermediate(
original_name=BertIntermediate
(dense): Linear(original_name=Linear)
(intermediate_act_fn): GELUActivation(original_name=GELUActivation)
)
(output): BertOutput(
original_name=BertOutput
(dense): Linear(original_name=Linear)
(LayerNorm): LayerNorm(original_name=LayerNorm)
(dropout): Dropout(original_name=Dropout)
)
)
)
)
)
)
trt_model = torch_tensorrt.compile(traced_mlm_model,
inputs= [torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32), # input_ids
torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32), # token_type_ids
torch_tensorrt.Input(shape=[batch_size, 47, 24], dtype=torch.int32)], # attention_mask
enabled_precisions= {torch.float32}, # Run with 32-bit precision
workspace_size=1000000000,
truncate_long_and_double=True
)
Steps to reproduce the behavior:
Expected behavior
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0):
- PyTorch Version (e.g. 1.0):
- CPU Architecture:
- OS (e.g., Linux):
- How you installed PyTorch (
conda,pip,libtorch, source): - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version:
- CUDA version:
- GPU models and configuration:
- Any other relevant information:
Additional context
Can you try making sure that the model is on GPU before compiling?
Can you try making sure that the model is on GPU before compiling?
In the example, the model and input is on CPU while compiling, I did it the same way, but I got this error. However, I also tried move input and model to GPU before compiling, compiling is stuck there without response. By the way, is LSTM module supported and max operation supported now?
LSTMCell and Max are supported but other variants of LSTM may not have a native implementation yet
LSTMCell and Max are supported but other variants of LSTM may not have a native implementation yet
I think the torch::max operator is invalid, I was able to convert normally using torch::topk
So, GRU is not supported? I got following error for gru.
ERROR: [Torch-TensorRT] - Unsupported operator: aten::gru.input(Tensor input, Tensor hx, Tensor[] params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first) -> (Tensor, Tensor)
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
cc: @gs-olive
Hello - the same error described in the original bug was fixed in #1416, which is included in the most recent Torch-TRT release (1.3.0). I do not have access to the model (from segment_model.segment import SegmentationFontEndBertModel) used in the provided code snippet, so I can't exactly reproduce the model, but I have tested out the regular BERT model with 3 input tensors, and compilation/inference succeed with the 1.3.0 release of Torch-TRT. Please let me know if the new release of Torch-TRT resolves the issue on your end!
I have torch-tensorrt 1.4 and I still got this issue. Compilation is very long (>5 minutes on a high-end machine for a very simple network, a mobilenet v3).
The issue is more that we donโt have a clue that itโs doing something as there is no progress bar or logs on the terminal. Also, the process does not respond to a keyboard interrupt, which let one think that the program has frozen.