System Info
- 2x H100 80GB on docker container (nvidia/cuda:12.4.1-devel-ubuntu22.04)
- last version of the library
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examples folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)
Reproduction
- Installation of tensorrt_llm as described in https://nvidia.github.io/TensorRT-LLM/installation/linux.html
- download of llava-v1.6-34b-hf model as described in https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md
- run the script python3 build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type llava_next --model_path tmp/hf_models/${MODEL_NAME} --max_batch_size 5
Expected behavior
conversion of the visual encoder in .engine format.
actual behavior
Received an error:
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024080600
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:01<00:00, 11.09it/s]
[08/08/2024-10:27:28] [TRT] [I] Exporting onnx to tmp/trt_engines/llava-v1.6-34b-hf/vision_encoder/onnx/model.onnx
Traceback (most recent call last):
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 817, in
builder.build()
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 84, in build
build_llava_engine(args)
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 374, in build_llava_engine
export_onnx(wrapper, image, f'{args.output_dir}/onnx')
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 118, in export_onnx
torch.onnx.export(model,
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 516, in export
_export(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1612, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1138, in _model_to_graph
graph = _optimize_graph(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 677, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1956, in _run_symbolic_function
return symbolic_fn(graph_context, *inputs, **attrs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_helper.py", line 306, in wrapper
return fn(g, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_opset14.py", line 176, in scaled_dot_product_attention
query_scaled = g.op("Mul", query, g.op("Sqrt", scale))
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 87, in op
return _add_op(self, opname, *raw_args, outputs=outputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in _add_op
inputs = [_const_if_tensor(graph_context, arg) for arg in args]
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in
inputs = [_const_if_tensor(graph_context, arg) for arg in args]
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 269, in _const_if_tensor
return _add_op(graph_context, "onnx::Constant", value_z=arg)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 246, in _add_op
node = _create_node(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 305, in _create_node
_add_attribute(node, key, value, aten=aten)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/internal/jit_utils.py", line 356, in add_attribute
return getattr(node, f"{kind}")(name, value)
TypeError: z(): incompatible function arguments. The following argument types are supported:
1. (self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node
Invoked with: %482 : Tensor = onnx::Constant(), scope: main.build_llava_engine..LlavaNextVisionWrapper::/transformers.models.clip.modeling_clip.CLIPVisionTransformer::vision_tower/transformers.models.clip.modeling_clip.CLIPEncoder::encoder/transformers.models.clip.modeling_clip.CLIPEncoderLayer::layers.0/transformers.models.clip.modeling_clip.CLIPSdpaAttention::self_attn
, 'value', 0.125
(Occurred when translating scaled_dot_product_attention).
additional notes
The environment was set up exactly as described in the documentation, using a Docker container with Ubuntu (https://nvidia.github.io/TensorRT-LLM/installation/linux.html).
The build of the LLM part, completed successfully without any issues.
I suspect the problem may be due to a mismatch of the libraries installed by the TensorRT-LLM installation procedure.
I have no idea how to resolve this issue.
Is there anyone who could provide some guidance on where to start?
Thank you.
I'm facing the same issues. Any updates on this @alexemme ? Thank you.
@pathorn @ttim @Superjomn can somebody help me please?
I'm facing the same issues. Any updates on this @alexemme ? Thank you.
No updates
@alexemme Can you try this on latest preview package?
Tested this on latest package and engine build runs without error for both llava-v1.6-mistral-7b-hf and llava-v1.6-34b-hf
An earlier version might have had a package mismatch.
@alexemme If you have no further questions, we will close it in a week.