TensorRT-LLM Qwen-VL inference errors

Hello, I deployed the model based on examples/qwenvl/README.md, but the model inference result of running run.py was incorrect. What is the problem?

Input: "[{'image': './pics/demo.jpeg'}, {'text': 'Describe the picture'}]" Output: "On the beach, a dog and its owner sit side by side, with the owner holding a cup of coffee."

May 15 '24 04:05 jdmdj1999

I have this problem too. I changed the Input text to "Analyze style of the picture", the answer is irrelevant.

CUDA_VISIBLE_DEVICES=1 python3 run.py --tokenizer_dir=/home/Qwen-VL-Chat --qwen_engine_dir=./trt_engines/Qwen-VL-7B-Chat --vit_engine_dir=./plan --images_path='{"image": "./pics/demo.jpeg"}' --input_dir='{"image": "image.pt"}'

Input: "[{'image': './pics/demo.jpeg'}, {'text': 'Analyze style of the picture.'}]"

Output: "The style of the picture is quite unique and striking. It features a close-up shot of a person's feet on a sandy beach, with the sea in the background. The person is wearing flip flops, which adds to the casual and relaxed atmosphere of the scene. The composition of the picture is simple yet effective, focusing on the interplay between the person's feet and the natural surroundings. The use of negative space also adds to the visual impact of the image, making it more engaging and memorable. Overall, the style of the picture is both artistic and evocative, capturing a sense of leisure and freedom in a beautiful coastal setting."

PLUS, when I tried some other input texts, it answered that "no images are given" several times, very confusing.

May 22 '24 03:05 chiquitita-101

Hi @jdmdj1999 @chiquitita-101 what TRT version you are using and what kind of quantization method you're using for Qwen?

May 23 '24 08:05 sunnyqgg

Hi@sunnyqgg tensorrt-llm 0.9.0.dev2024040200 Qwen-VL-Chat I deployed according to examples/qwenvl/README.md

May 23 '24 08:05 jdmdj1999

I found that the error was caused by deployment and inference on different type of GPU

May 25 '24 03:05 jdmdj1999