TensorRT-LLM Error while building Finetuned Mistral model using trt-llm

System Info

GPU - A100 Python version: 3.10.13 library versions: tensorrt==9.2.0.post12.dev5 tensorrt-bindings==9.2.0.post12.dev5 tensorrt-libs==9.2.0.post12.dev5 tensorrt-llm==0.9.0.dev2024020600

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I have a peft finetuned mistral model (base model: mistralai/Mistral-7B-Instruct-v0.2) and I have merged the adapter with the base model using merge_and_unload() function from huggingface
I am trying to follow the instructions of using mistral with tensorrt from here: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#mistral-v01 Below are the outputs for the commands command1: python convert_checkpoint.py --model_dir merged_model --output_dir trt_model_bf16 --dtype bfloat16 output:

[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev20240206000.9.0.dev2024020600
You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00,  1.40s/it]
Weights loaded. Total time: 00:00:00
Total time of converting checkpoints: 00:00:11

above command executed successfully and output directory contents are config.json and rank0.safetensors

command2: trtllm-build --checkpoint_dir trt_model_bf16 --output_dir trt_engine_bf16 --gemm_plugin bfloat16 --max_input_len 32256 Output with error:

[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600[02/12/2024-12:40:16] [TRT-LLM] [I] Set bert_attention_plugin to float16.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set gemm_plugin to float16.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set lookup_plugin to None.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set lora_plugin to None.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set context_fmha to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set paged_kv_cache to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set remove_input_padding to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set multi_block_mode to False.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set enable_xqa to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set tokens_per_block to 128.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[02/12/2024-12:40:16] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_size*max_input_len. 
It may not be optimal to set max_num_tokens=max_batch_size*max_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads.
[02/12/2024-12:40:17] [TRT] [I] [MemUsageChange] Init CUDA: CPU +16, GPU +0, now: CPU 636, GPU 973 (MiB)
[02/12/2024-12:40:18] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1973, GPU +350, now: CPU 2745, GPU 1323 (MiB)
[02/12/2024-12:40:18] [TRT-LLM] [I] Set nccl_plugin to None.
[02/12/2024-12:40:18] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/vocab_embedding/GATHER_0_output_0 and LLaMAForCausalLM/transformer/layers/0/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/0/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/0/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/0/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/1/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/1/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/1/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/1/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/2/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/2/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/2/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/2/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/3/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/3/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/3/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/3/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/4/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/4/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/4/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/4/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/5/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/5/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/5/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/5/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/6/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/6/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/6/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/6/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/7/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/7/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/7/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/7/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/8/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/8/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/8/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/8/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/9/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/9/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/9/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/9/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/10/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/10/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/10/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/10/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/10/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/10/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/10/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/10/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/11/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/11/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/11/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/11/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/11/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/11/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/11/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/11/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/12/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/12/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/12/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/12/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/12/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/12/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/12/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/12/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/13/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/13/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/13/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/13/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/13/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/13/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/13/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/13/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/14/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/14/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/14/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/14/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/14/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/14/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/14/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/14/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/15/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/15/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/15/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/15/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/15/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/15/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/15/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/15/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/16/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/16/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/16/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/16/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/16/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/16/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/16/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/16/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/17/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/17/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/17/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/17/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/17/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/17/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/17/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/17/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/18/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/18/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/18/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/18/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/18/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/18/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/18/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/18/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/19/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/19/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/19/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/19/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/19/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/19/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/19/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/19/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/20/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/20/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/20/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/20/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/20/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/20/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/20/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/20/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/21/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/21/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/21/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/21/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/21/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/21/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/21/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/21/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/22/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/22/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/22/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/22/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/22/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/22/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/22/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/22/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/23/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/23/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/23/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/23/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/23/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/23/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/23/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/23/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/24/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/24/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/24/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/24/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/24/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/24/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/24/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/24/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/25/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/25/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/25/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/25/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/25/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/25/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/25/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/25/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/26/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/26/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/26/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/26/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/26/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/26/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/26/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/26/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/27/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/27/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/27/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/27/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/27/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/27/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/27/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/27/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/28/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/28/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/28/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/28/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/28/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/28/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/28/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/28/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/29/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/29/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/29/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/29/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/29/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/29/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/29/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/29/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/30/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/30/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/30/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/30/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/30/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/30/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/30/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/30/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/31/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/31/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/31/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/31/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/31/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/31/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/31/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/31/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/ln_f/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/ln_f/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/ln_f/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[02/12/2024-12:40:19] [TRT] [W] Unused Input: position_ids
[02/12/2024-12:40:19] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
[02/12/2024-12:40:19] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2785, GPU 1349 (MiB)
[02/12/2024-12:40:19] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 2787, GPU 1359 (MiB)
[02/12/2024-12:40:19] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[02/12/2024-12:40:19] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[02/12/2024-12:40:19] [TRT] [E] 9: LLaMAForCausalLM/transformer/layers/0/attention/qkv/PLUGIN_V2_Gemm_0: could not find any supported formats consistent with input/output data types
[02/12/2024-12:40:19] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::24] Error Code 9: Internal Error (LLaMAForCausalLM/transformer/layers/0/attention/qkv/PLUGIN_V2_Gemm_0: could not find any supported formats consistent with input/output data types)
[02/12/2024-12:40:19] [TRT-LLM] [E] Engine building failed, please check the error log.
[02/12/2024-12:40:20] [TRT-LLM] [I] Serializing engine to trt_engine_bf16_01/rank0.engine...
Traceback (most recent call last):
  File "/home/azureuser/.conda/envs/tensorrt-venv/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 489, in main
    parallel_build(source, build_config, args.output_dir, workers,
  File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 413, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 392, in build_and_save
    engine.save(output_dir)
  File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/runtime/engine.py", line 60, in save
    serialize_engine(
  File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/runtime/engine.py", line 18, in serialize_engine
    f.write(engine)
TypeError: a bytes-like object is required, not 'NoneType'

Expected behavior

Model building should be successful

actual behavior

Model building failed with an exception

additional notes

I am not using any docker. Just created a python venv and installed tersorrt, tensorrt-llm python packages from pypi. All the installations were successful without any errors

Feb 13 '24 08:02 vishnula-kore

Hi, could you try the following command to build the engine? trtllm-build --checkpoint_dir trt_model_bf16 --output_dir trt_engine_bf16 --gemm_plugin bfloat16 --max_input_len 32256 --gpt_attention_plugin bfloat16

Feb 17 '24 01:02 weiqisunnvidia

Hi @vishnula-kore, I am also facing the exact same issue. Wanted to know if you were able to solve it

Feb 20 '24 15:02 mayankagrawal-quantiphi

Hi, could you try the following command to build the engine? trtllm-build --checkpoint_dir trt_model_bf16 --output_dir trt_engine_bf16 --gemm_plugin bfloat16 --max_input_len 32256 --gpt_attention_plugin bfloat16

I tried adding the gpt_attention_plugin option to my runs and it still failed: ` File "/usr/local/bin/trtllm-build", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 489, in main parallel_build(source, build_config, args.output_dir, workers, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 413, in parallel_build passed = build_and_save(rank, rank % workers, ckpt_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 385, in build_and_save engine = build(build_config, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 276, in build return build_model(model, build_config) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 193, in build_model model(**inputs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in call output = self.forward(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 498, in forward hidden_states = self.transformer.forward(**kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 201, in forward hidden_states = self.layers.forward( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 255, in forward hidden_states = layer( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in call output = self.forward(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 115, in forward attention_output = self.attention( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in call output = self.forward(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/attention.py", line 741, in forward context, past_key_value = gpt_attention( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/graph_rewriting.py", line 561, in wrapper outs = f(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 3813, in gpt_attention layer = default_trtnet().add_plugin_v2(plug_inputs, attn_plug) TypeError: add_plugin_v2(): incompatible function arguments. The following argument types are supported: 1. (self: tensorrt_bindings.tensorrt.INetworkDefinition, inputs: List[tensorrt_bindings.tensorrt.ITensor], plugin: tensorrt_bindings.tensorrt.IPluginV2) -> tensorrt_bindings.tensorrt.IPluginV2Layer

Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7f8db4198630>, [<tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564edc30>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564ce030>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564ce530>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564cec70>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564d3b70>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564ce770>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564d3d70>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564ce270>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564c02b0>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564c0570>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564cea30>], None`

It works when i just used the same convert to checkpoint using normal float16 and the normal build with float16. I'm just using vanilla mistral from HF

maybe this chunk is also useful... [02/21/2024-01:16:50] [TRT-LLM] [I] Set gpt_attention_plugin to bfloat16. [02/21/2024-01:16:50] [TRT-LLM] [I] Set gemm_plugin to bfloat16. [02/21/2024-01:16:50] [TRT-LLM] [I] Set lookup_plugin to None. [02/21/2024-01:16:50] [TRT-LLM] [I] Set lora_plugin to None. [02/21/2024-01:16:50] [TRT-LLM] [I] Set context_fmha to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set context_fmha_fp32_acc to False. [02/21/2024-01:16:50] [TRT-LLM] [I] Set paged_kv_cache to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set remove_input_padding to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set use_custom_all_reduce to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set multi_block_mode to False. [02/21/2024-01:16:50] [TRT-LLM] [I] Set enable_xqa to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set attention_qk_half_accumulation to False. [02/21/2024-01:16:50] [TRT-LLM] [I] Set tokens_per_block to 128. [02/21/2024-01:16:50] [TRT-LLM] [I] Set use_paged_context_fmha to False. [02/21/2024-01:16:50] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.

good luck. thanks for the python package.

Feb 21 '24 01:02 joe-schwartz-certara

Hi, could you try the following command to build the engine? trtllm-build --checkpoint_dir trt_model_bf16 --output_dir trt_engine_bf16 --gemm_plugin bfloat16 --max_input_len 32256 --gpt_attention_plugin bfloat16

Thanks @weiqisunnvidia this command worked.

Feb 22 '24 10:02 vishnula-kore

--gemm_plugin bfloat16 --gpt_attention_plugin bfloat16 --strongly_typed is not help : https://github.com/NVIDIA/TensorRT-LLM/issues/1273

Mar 12 '24 08:03 plt12138

Got the same issue when try to build engine for llama3 model

Here's the command I use:

trtllm-build --checkpoint_dir ./tmp/trtllm-Llama-3-8B-Instruct-1gpu-bf16 \
            --output_dir ./tmp/trt_engines/llama3/8B/bf16/1-gpu \
            --gpt_attention_plugin bfloat16 \
            --gemm_plugin bfloat16

And here's the output (truncated):

[05/21/2024-07:17:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 155, GPU 269 (MiB)
[05/21/2024-07:17:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1810, GPU +312, now: CPU 2101, GPU 581 (MiB)
[05/21/2024-07:17:37] [TRT-LLM] [I] Set nccl_plugin to None.
[05/21/2024-07:17:37] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/vocab_embedding/GATHER_0_output_0 and LLaMAForCausalLM/transformer/layers/0/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/0/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/0/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/0/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/1/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/1/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/1/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/1/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/2/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/2/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/2/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/2/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/3/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/3/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/3/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/3/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/4/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/4/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/4/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/4/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/5/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/5/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/5/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/5/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/6/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/6/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/6/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/6/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/7/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/7/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/7/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/7/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/8/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/8/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/8/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/8/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/9/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/9/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/9/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.

May 21 '24 07:05 michaelnny

@plt12138 I am not pretty sure what problem do you encounter. You post another issue, but the issue use Mixtral instead of Mistral. Do you also encounter issue on Mistral? If so, please share your reproduced steps and the full log. Otherwise, let's discuss in another issue.

@michaelnny From the log you post, I don't see the error. Could you post your converting script and the full log?

May 23 '24 07:05 byshiue

@byshiue My issue was resolved after re-install v0.9.0 in a clean container environment.

May 24 '24 00:05 michaelnny