Error while building Finetuned Mistral model using trt-llm
System Info
GPU - A100 Python version: 3.10.13 library versions: tensorrt==9.2.0.post12.dev5 tensorrt-bindings==9.2.0.post12.dev5 tensorrt-libs==9.2.0.post12.dev5 tensorrt-llm==0.9.0.dev2024020600
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
- I have a peft finetuned mistral model (base model: mistralai/Mistral-7B-Instruct-v0.2) and I have merged the adapter with the base model using merge_and_unload() function from huggingface
- I am trying to follow the instructions of using mistral with tensorrt from here: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#mistral-v01 Below are the outputs for the commands command1: python convert_checkpoint.py --model_dir merged_model --output_dir trt_model_bf16 --dtype bfloat16 output:
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev20240206000.9.0.dev2024020600
You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.40s/it]
Weights loaded. Total time: 00:00:00
Total time of converting checkpoints: 00:00:11
above command executed successfully and output directory contents are config.json and rank0.safetensors
command2: trtllm-build --checkpoint_dir trt_model_bf16 --output_dir trt_engine_bf16 --gemm_plugin bfloat16 --max_input_len 32256 Output with error:
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024020600[02/12/2024-12:40:16] [TRT-LLM] [I] Set bert_attention_plugin to float16.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set gemm_plugin to float16.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set lookup_plugin to None.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set lora_plugin to None.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set context_fmha to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set paged_kv_cache to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set remove_input_padding to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set multi_block_mode to False.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set enable_xqa to True.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set tokens_per_block to 128.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[02/12/2024-12:40:16] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[02/12/2024-12:40:16] [TRT-LLM] [W] remove_input_padding is enabled, while max_num_tokens is not set, setting to max_batch_size*max_input_len.
It may not be optimal to set max_num_tokens=max_batch_size*max_input_len when remove_input_padding is enabled, because the number of packed input tokens are very likely to be smaller, we strongly recommend to set max_num_tokens according to your workloads.
[02/12/2024-12:40:17] [TRT] [I] [MemUsageChange] Init CUDA: CPU +16, GPU +0, now: CPU 636, GPU 973 (MiB)
[02/12/2024-12:40:18] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1973, GPU +350, now: CPU 2745, GPU 1323 (MiB)
[02/12/2024-12:40:18] [TRT-LLM] [I] Set nccl_plugin to None.
[02/12/2024-12:40:18] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/vocab_embedding/GATHER_0_output_0 and LLaMAForCausalLM/transformer/layers/0/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/0/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/0/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/0/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/1/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/1/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/1/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/1/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/2/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/2/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/2/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/2/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/3/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/3/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/3/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/3/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/4/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/4/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/4/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/4/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/5/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:18] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/5/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/5/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/5/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/6/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/6/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/6/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/6/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/7/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/7/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/7/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/7/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/8/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/8/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/8/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/8/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/9/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/9/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/9/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/9/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/10/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/10/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/10/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/10/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/10/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/10/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/10/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/10/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/11/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/11/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/11/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/11/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/11/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/11/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/11/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/11/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/12/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/12/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/12/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/12/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/12/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/12/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/12/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/12/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/13/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/13/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/13/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/13/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/13/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/13/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/13/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/13/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/14/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/14/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/14/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/14/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/14/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/14/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/14/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/14/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/15/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/15/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/15/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/15/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/15/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/15/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/15/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/15/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/16/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/16/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/16/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/16/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/16/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/16/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/16/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/16/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/17/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/17/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/17/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/17/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/17/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/17/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/17/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/17/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/18/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/18/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/18/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/18/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/18/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/18/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/18/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/18/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/19/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/19/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/19/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/19/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/19/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/19/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/19/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/19/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/20/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/20/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/20/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/20/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/20/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/20/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/20/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/20/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/21/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/21/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/21/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/21/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/21/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/21/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/21/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/21/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/22/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/22/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/22/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/22/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/22/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/22/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/22/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/22/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/23/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/23/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/23/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/23/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/23/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/23/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/23/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/23/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/24/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/24/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/24/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/24/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/24/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/24/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/24/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/24/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/25/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/25/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/25/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/25/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/25/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/25/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/25/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/25/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/26/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/26/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/26/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/26/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/26/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/26/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/26/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/26/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/27/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/27/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/27/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/27/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/27/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/27/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/27/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/27/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/28/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/28/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/28/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/28/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/28/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/28/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/28/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/28/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/29/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/29/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/29/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/29/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/29/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/29/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/29/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/29/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/30/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/30/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/30/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/30/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/30/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/30/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/30/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/30/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/31/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/31/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/31/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/31/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/31/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/31/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/31/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/31/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/ln_f/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/ln_f/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/ln_f/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[02/12/2024-12:40:19] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[02/12/2024-12:40:19] [TRT] [W] Unused Input: position_ids
[02/12/2024-12:40:19] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
[02/12/2024-12:40:19] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2785, GPU 1349 (MiB)
[02/12/2024-12:40:19] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 2787, GPU 1359 (MiB)
[02/12/2024-12:40:19] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[02/12/2024-12:40:19] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[02/12/2024-12:40:19] [TRT] [E] 9: LLaMAForCausalLM/transformer/layers/0/attention/qkv/PLUGIN_V2_Gemm_0: could not find any supported formats consistent with input/output data types
[02/12/2024-12:40:19] [TRT] [E] 9: [pluginV2Builder.cpp::reportPluginError::24] Error Code 9: Internal Error (LLaMAForCausalLM/transformer/layers/0/attention/qkv/PLUGIN_V2_Gemm_0: could not find any supported formats consistent with input/output data types)
[02/12/2024-12:40:19] [TRT-LLM] [E] Engine building failed, please check the error log.
[02/12/2024-12:40:20] [TRT-LLM] [I] Serializing engine to trt_engine_bf16_01/rank0.engine...
Traceback (most recent call last):
File "/home/azureuser/.conda/envs/tensorrt-venv/bin/trtllm-build", line 8, in <module>
sys.exit(main())
File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 489, in main
parallel_build(source, build_config, args.output_dir, workers,
File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 413, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 392, in build_and_save
engine.save(output_dir)
File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/runtime/engine.py", line 60, in save
serialize_engine(
File "/home/azureuser/.conda/envs/tensorrt-venv/lib/python3.10/site-packages/tensorrt_llm/runtime/engine.py", line 18, in serialize_engine
f.write(engine)
TypeError: a bytes-like object is required, not 'NoneType'
Expected behavior
Model building should be successful
actual behavior
Model building failed with an exception
additional notes
I am not using any docker. Just created a python venv and installed tersorrt, tensorrt-llm python packages from pypi. All the installations were successful without any errors
Hi, could you try the following command to build the engine?
trtllm-build --checkpoint_dir trt_model_bf16 --output_dir trt_engine_bf16 --gemm_plugin bfloat16 --max_input_len 32256 --gpt_attention_plugin bfloat16
Hi @vishnula-kore, I am also facing the exact same issue. Wanted to know if you were able to solve it
Hi, could you try the following command to build the engine?
trtllm-build --checkpoint_dir trt_model_bf16 --output_dir trt_engine_bf16 --gemm_plugin bfloat16 --max_input_len 32256 --gpt_attention_plugin bfloat16
I tried adding the gpt_attention_plugin option to my runs and it still failed:
` File "/usr/local/bin/trtllm-build", line 8, in
Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7f8db4198630>, [<tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564edc30>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564ce030>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564ce530>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564cec70>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564d3b70>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564ce770>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564d3d70>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564ce270>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564c02b0>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564c0570>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f8f564cea30>], None`
It works when i just used the same convert to checkpoint using normal float16 and the normal build with float16. I'm just using vanilla mistral from HF
maybe this chunk is also useful...
[02/21/2024-01:16:50] [TRT-LLM] [I] Set gpt_attention_plugin to bfloat16. [02/21/2024-01:16:50] [TRT-LLM] [I] Set gemm_plugin to bfloat16. [02/21/2024-01:16:50] [TRT-LLM] [I] Set lookup_plugin to None. [02/21/2024-01:16:50] [TRT-LLM] [I] Set lora_plugin to None. [02/21/2024-01:16:50] [TRT-LLM] [I] Set context_fmha to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set context_fmha_fp32_acc to False. [02/21/2024-01:16:50] [TRT-LLM] [I] Set paged_kv_cache to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set remove_input_padding to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set use_custom_all_reduce to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set multi_block_mode to False. [02/21/2024-01:16:50] [TRT-LLM] [I] Set enable_xqa to True. [02/21/2024-01:16:50] [TRT-LLM] [I] Set attention_qk_half_accumulation to False. [02/21/2024-01:16:50] [TRT-LLM] [I] Set tokens_per_block to 128. [02/21/2024-01:16:50] [TRT-LLM] [I] Set use_paged_context_fmha to False. [02/21/2024-01:16:50] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
good luck. thanks for the python package.
Hi, could you try the following command to build the engine?
trtllm-build --checkpoint_dir trt_model_bf16 --output_dir trt_engine_bf16 --gemm_plugin bfloat16 --max_input_len 32256 --gpt_attention_plugin bfloat16
Thanks @weiqisunnvidia this command worked.
--gemm_plugin bfloat16 --gpt_attention_plugin bfloat16 --strongly_typed is not help :
https://github.com/NVIDIA/TensorRT-LLM/issues/1273
Got the same issue when try to build engine for llama3 model
Here's the command I use:
trtllm-build --checkpoint_dir ./tmp/trtllm-Llama-3-8B-Instruct-1gpu-bf16 \
--output_dir ./tmp/trt_engines/llama3/8B/bf16/1-gpu \
--gpt_attention_plugin bfloat16 \
--gemm_plugin bfloat16
And here's the output (truncated):
[05/21/2024-07:17:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 155, GPU 269 (MiB)
[05/21/2024-07:17:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1810, GPU +312, now: CPU 2101, GPU 581 (MiB)
[05/21/2024-07:17:37] [TRT-LLM] [I] Set nccl_plugin to None.
[05/21/2024-07:17:37] [TRT-LLM] [I] Set use_custom_all_reduce to True.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/vocab_embedding/GATHER_0_output_0 and LLaMAForCausalLM/transformer/layers/0/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/0/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/0/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/0/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/0/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/1/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/1/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/1/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/1/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/1/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/2/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/2/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/2/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/2/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/2/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/3/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/3/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/3/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/3/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/3/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/4/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/4/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/4/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/4/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/4/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/5/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/5/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/5/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/5/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/5/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/6/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/6/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/6/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/6/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/6/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/7/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/7/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/7/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/7/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/7/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/8/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/8/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/8/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/8/post_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/8/ELEMENTWISE_SUM_1_output_0 and LLaMAForCausalLM/transformer/layers/9/input_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/input_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/transformer/layers/9/input_layernorm/SHUFFLE_1_output_0: first input has type BFloat16 but second input has type Float.
[05/21/2024-07:17:37] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/transformer/layers/9/ELEMENTWISE_SUM_0_output_0 and LLaMAForCausalLM/transformer/layers/9/post_layernorm/SHUFFLE_0_output_0: first input has type BFloat16 but second input has type Float.
@plt12138 I am not pretty sure what problem do you encounter. You post another issue, but the issue use Mixtral instead of Mistral. Do you also encounter issue on Mistral? If so, please share your reproduced steps and the full log. Otherwise, let's discuss in another issue.
@michaelnny From the log you post, I don't see the error. Could you post your converting script and the full log?
@byshiue My issue was resolved after re-install v0.9.0 in a clean container environment.