The llava model batch inference result is different with batch=1
System info
GPU: A100 tensorrt 9.3.0.post12.dev1 tensorrt-llm 0.9.0 torch 2.2.2
Reproduction
export MODEL_NAME="llava-1.5-7b-hf"
git clone https://huggingface.co/llava-hf/${MODEL_NAME} tmp/hf_models/${MODEL_NAME}
python ../llama/convert_checkpoint.py \
--model_dir tmp/hf_models/${MODEL_NAME} \
--output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \
--dtype float16
trtllm-build \
--checkpoint_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \
--output_dir trt_engines/${MODEL_NAME}/fp16/1-gpu \
--gemm_plugin float16 \
--use_fused_mlp \
--max_batch_size 16 \
--max_input_len 2048 \
--max_output_len 512 \
--max_multimodal_len 9216 # 1 (max_batch_size) * 576 (num_visual_features)
python build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type llava # or "--model_type vila" for VILA
python run.py \
--max_new_tokens 20 \
--hf_model_dir tmp/hf_models/${MODEL_NAME} \
--visual_engine_dir visual_engines/${MODEL_NAME} \
--llm_engine_dir trt_engines/${MODEL_NAME}/fp16/1-gpu \
--decoder_llm \
--input_text "Question: which city is this? Answer:"
--batch_size 16
if I use the same data to form a batch,the result like this:
and if I use two different prompt to form a batch,the reslt like this:
The image used is : https://storage.googleapis.com/sfr-vision-language-research/LAVIS/assets/merlion.png
I saw similar results with llama3. Mine was resolved when I disabled 'use_custom_all_reduce' in compilation
Could you try the latest versoin TRT_LLM 0.11+ https://nvidia.github.io/TensorRT-LLM/installation/linux.html
Could you try the latest versoin TRT_LLM 0.11+ https://nvidia.github.io/TensorRT-LLM/installation/linux.html
trt_llm 0.10 and 0.11. when I import tensorrt_llm,
Could you try the latest versoin TRT_LLM 0.11+ https://nvidia.github.io/TensorRT-LLM/installation/linux.html
if I install TRT_LLM 0.11, will occur: ModuleNotFoundError: No module named 'tensorrt_llm.bindings.BuildInfo' do you know how to solve it?
use_custom_all_reduce
Does this parameter affect a single gpu? I dont use tp or pp
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."