TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

The llava model batch inference result is different with batch=1

Open lss15151161 opened this issue 1 year ago • 5 comments

System info

GPU: A100 tensorrt 9.3.0.post12.dev1 tensorrt-llm 0.9.0 torch 2.2.2

Reproduction

export MODEL_NAME="llava-1.5-7b-hf"
git clone https://huggingface.co/llava-hf/${MODEL_NAME} tmp/hf_models/${MODEL_NAME}
python ../llama/convert_checkpoint.py \
    --model_dir tmp/hf_models/${MODEL_NAME} \
    --output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \
    --dtype float16

trtllm-build \
    --checkpoint_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \
    --output_dir trt_engines/${MODEL_NAME}/fp16/1-gpu \
    --gemm_plugin float16 \
    --use_fused_mlp \
    --max_batch_size 16 \
    --max_input_len 2048 \
    --max_output_len 512 \
    --max_multimodal_len 9216 # 1 (max_batch_size) * 576 (num_visual_features)

python build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type llava # or "--model_type vila" for VILA

python run.py \
    --max_new_tokens 20 \
    --hf_model_dir tmp/hf_models/${MODEL_NAME} \
    --visual_engine_dir visual_engines/${MODEL_NAME} \
    --llm_engine_dir trt_engines/${MODEL_NAME}/fp16/1-gpu \
    --decoder_llm \
    --input_text "Question: which city is this? Answer:"
    --batch_size 16

if I use the same data to form a batch,the result like this: image

and if I use two different prompt to form a batch,the reslt like this: image

image

The image used is : https://storage.googleapis.com/sfr-vision-language-research/LAVIS/assets/merlion.png

lss15151161 avatar Jun 26 '24 07:06 lss15151161

I saw similar results with llama3. Mine was resolved when I disabled 'use_custom_all_reduce' in compilation

TheCodeWrangler avatar Jun 26 '24 12:06 TheCodeWrangler

Could you try the latest versoin TRT_LLM 0.11+ https://nvidia.github.io/TensorRT-LLM/installation/linux.html

hijkzzz avatar Jun 27 '24 00:06 hijkzzz

Could you try the latest versoin TRT_LLM 0.11+ https://nvidia.github.io/TensorRT-LLM/installation/linux.html

trt_llm 0.10 and 0.11. when I import tensorrt_llm,

Could you try the latest versoin TRT_LLM 0.11+ https://nvidia.github.io/TensorRT-LLM/installation/linux.html

if I install TRT_LLM 0.11, will occur: ModuleNotFoundError: No module named 'tensorrt_llm.bindings.BuildInfo' do you know how to solve it?

lss15151161 avatar Jul 02 '24 03:07 lss15151161

use_custom_all_reduce

Does this parameter affect a single gpu? I dont use tp or pp

lss15151161 avatar Jul 03 '24 03:07 lss15151161

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] avatar Aug 17 '24 01:08 github-actions[bot]