lmdeploy [Bug] internvl2-4B 在v100上推理异常，输出text均为空

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

internvl2-4B 在v100上推理异常，输出text均为空, 在T4上测试ok, 正常推理；出现相同问题的还有Phi-3-vision-128k-instruct 另外测过internvl2-2B，在v100上正常推理；huggingface 加载internvl2-4B，在v100上正常推理

Reproduction

from lmdeploy import pipeline, PytorchEngineConfig, ChatTemplateConfig from lmdeploy.vl import load_image

model = 'InternVL2-4B' system_prompt = '我是书生·万象，英文名是InternVL，是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。' image = load_image('test.jpg') chat_template_config = ChatTemplateConfig('internvl-phi3') chat_template_config.meta_instruction = system_prompt pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO') response = pipe(('describe this image', image)) print(response.text)

Environment

docker: openmmlab/lmdeploy:v0.5.2.post1
pip install timm

Error traceback

>>> pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO')
2024-08-08 13:34:29,241 - lmdeploy - WARNING - Fallback to pytorch engine because `InternVL2-4B` not supported by turbomind engine.
2024-08-08 13:34:29,241 - lmdeploy - INFO - Using pytorch engine
2024-08-08 13:34:29,422 - lmdeploy - INFO - matching vision model: InternVLVisionModel
FlashAttention is not installed.
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Warning: Flash Attention is not available, use_flash_attn is set to False.
2024-08-08 13:34:34,065 - lmdeploy - INFO - using InternVL-Chat-V1-5 vision preprocess
2024-08-08 13:34:34,076 - lmdeploy - INFO - input backend=pytorch, backend_config=PytorchEngineConfig(model_name='', tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, eviction_type='recompute', prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=4096, thread_safe=False, enable_prefix_caching=False, device_type='cuda', download_dir=None, revision=None)
2024-08-08 13:34:34,076 - lmdeploy - INFO - input chat_template_config=None
2024-08-08 13:34:34,083 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='internvl2-phi3', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-08-08 13:34:34,145 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-08-08 13:34:35,383 - lmdeploy - INFO - Checking model.
2024-08-08 13:34:35,384 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.41.2], but found version: 4.42.3
Loading checkpoint shards: 100%|█████████████████████████████████████| 2/2 [00:04<00:00,  2.10s/it]
2024-08-08 13:34:39,845 - lmdeploy - INFO - Patching model.
2024-08-08 13:34:40,458 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=170, num_gpu_blocks=755, window_size=262144, cache_max_entry_count=0.8, max_prefill_token_num=4096, enable_prefix_caching=False)
2024-08-08 13:34:42,248 - lmdeploy - INFO - updated backend_config=PytorchEngineConfig(model_name='', tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, eviction_type='recompute', prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=4096, thread_safe=False, enable_prefix_caching=False, device_type='cuda', download_dir=None, revision=None)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
>>> image = load_image('test.jpg')
>>> response = pipe(('describe this image', image))
2024-08-08 13:34:42,608 - lmdeploy - INFO - start ImageEncoder._forward_loop
2024-08-08 13:34:42,608 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-08-08 13:34:42,608 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
2024-08-08 13:34:43,105 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 0.497s
2024-08-08 13:34:43,106 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-08-08 13:34:43,107 - lmdeploy - INFO - prompt='<|system|>\n你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型，英文名叫InternVL, 是一个有用无害的人工智能助手。<|end|>\n<|user|>\n<img><IMAGE_TOKEN></img>\ndescribe this image<|end|>\n<|assistant|>\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=15592860339931865043, stop_words=[32007, 32000], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 32006, 30919, 30392, 31272, 30429, 30581, 30313, 31041, 31676, 30815, 31195, 236, 173, 143, 232, 177, 167, 31986, 30733, 31427, 233, 180, 167, 31030, 31615, 31026, 30910, 30210, 31900, 30486, 30923, 31382, 31613, 30257, 31382, 30883, 30214, 31144, 30333, 30548, 232, 146, 174, 17579, 29963, 29931, 29892, 29871, 30392, 30287, 30502, 30417, 30406, 31352, 232, 177, 182, 30210, 30313, 31041, 31676, 30815, 31931, 30880, 30267, 32007, 32010, 32011, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32012, 13, 2783, 29581, 445, 1967, 32007, 32001], adapter_name=None.
2024-08-08 13:34:43,107 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=1869, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True
>>> print(response)
Response(text='', generate_token_len=512, input_token_len=1869, session_id=0, finish_reason='length', token_ids=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], logprobs=None)

Aug 09 '24 03:08 qism

2024-08-08 13:34:43,107 - lmdeploy - INFO - prompt='<|system|>\n你是由上海人工智能实验室联合商汤科技开

从一行看prompt应该没问题，可能的原因可能是特征导致的，可以检查下特征。

pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO')
image = load_image('test.jpg')
response = pipe(('describe this image', image))

可以检查下上面这种方式提取到的特征 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/serve/vl_async_engine.py#L66

与下面这种方式提取到的特征是否一致

pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO')
image = load_image('test.jpg')
pipe.vl_encoder.forward([image])

另外，纯文本的推理有问题么？

Aug 09 '24 07:08 irexyc

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Aug 23 '24 02:08 github-actions[bot]

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

Aug 28 '24 02:08 github-actions[bot]

你好，我也遇到了response.text为空的情况，请教一下，你解决了嘛？

Sep 11 '24 09:09 Fly2flies

2024-08-08 13:34:43,107 - lmdeploy - INFO - prompt='<|system|>\n你是由上海人工智能实验室联合商汤科技开
从一行看prompt应该没问题，可能的原因可能是特征导致的，可以检查下特征。
pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO')
image = load_image('test.jpg')
response = pipe(('describe this image', image))
可以检查下上面这种方式提取到的特征 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/serve/vl_async_engine.py#L66

与下面这种方式提取到的特征是否一致
pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO')
image = load_image('test.jpg')
pipe.vl_encoder.forward([image])
另外，纯文本的推理有问题么？

2024-08-08 13:34:43,107 - lmdeploy - INFO - prompt='<|system|>\n你是由上海人工智能实验室联合商汤科技开
从一行看prompt应该没问题，可能的原因可能是特征导致的，可以检查下特征。
pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO')
image = load_image('test.jpg')
response = pipe(('describe this image', image))
可以检查下上面这种方式提取到的特征 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/serve/vl_async_engine.py#L66

与下面这种方式提取到的特征是否一致
pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO')
image = load_image('test.jpg')
pipe.vl_encoder.forward([image])
另外，纯文本的推理有问题么？ nest_asyncio.apply() model_path = path system_prompt = '我是书生·万象，英文名是InternVL，是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。' chat_template_config = ChatTemplateConfig('internvl-internlm2') chat_template_config.meta_instruction = system_prompt gen_config = GenerationConfig(max_new_tokens=512)

model = pipeline(model_path, chat_template_config=chat_template_config, backend_config=TurbomindEngineConfig(tp=torch.cuda.device_count(), session_len=8192, cache_max_entry_count=0.8), log_level='INFO')

纯文本推理没有问题： response = model('How are you?') print(response.text)

2024-09-11 17:37:07,279 - lmdeploy - INFO - prompt='<|im_start|>system\n我是书生·万象，英文名是InternVL，是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型。<|im_end|>\n<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=13294313270250957050, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 68734, 60628, 60384, 60721, 60775, 60978, 60353, 79448, 60357, 1214, 1070, 30924, 60353, 69643, 68589, 76659, 71581, 60359, 77859, 60543, 75438, 68558, 68542, 69504, 68640, 71434, 60838, 60921, 60368, 68790, 70218, 60355, 92542, 364, 92543, 1008, 364, 4500, 657, 629, 345, 92542, 364, 92543, 525, 11353, 364], adapter_name=None. 2024-09-11 17:37:07,280 - lmdeploy - INFO - session_id=0, history_tokens=0, input_tokens=51, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True 2024-09-11 17:37:07,281 - lmdeploy - INFO - Register stream callback for 0 [TM][INFO] [forward] Enqueue requests [TM][INFO] [forward] Wait for requests to complete ... [TM][INFO] [ProcessInferRequests] Request for 0 received. [TM][INFO] ------------------------- step = 50 ------------------------- [TM][INFO] [Forward] [0, 1), dc=0, pf=1, sum_q=51, sum_k=51, max_q=51, max_k=51 [TM][INFO] ------------------------- step = 60 ------------------------- 2024-09-11 17:37:07,804 - lmdeploy - INFO - UN-register stream callback for 0 As an AI language model, I don't have emotions, but I'm functioning properly and ready to assist you. How can I help you today?

但是加了图片没有输出，单纯验证图片特征是Nan: `image = load_image('test_image.jpg') print(type(image)) print(image)

model.vl_encoder.forward([image])`

<class 'PIL.JpegImagePlugin.JpegImageFile'> <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1600x900 at 0x7F13BA4193A0> 2024-09-11 17:39:30,582 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 0.764s [tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], dtype=torch.float16)]

Sep 11 '24 09:09 Fly2flies

@Fly2flies

model.vl_encoder.forward([image]) 得到的特征如果是nan的话，说明提特征这一步有问题。

一般出问题是在tp不为1的时候发生，你应该用了多卡？tp=torch.cuda.device_count()

为了排除环境的问题，可以在 openmmlab/lmdeploy:latest-cu11 或者 openmmlab/lmdeploy:latest-cu12 里面验证一下么？

Sep 12 '24 09:09 irexyc

是的，我使用了多卡model = pipeline(model_path, chat_template_config=chat_template_config, backend_config=TurbomindEngineConfig(tp=torch.cuda.device_count(), session_len=8192, cache_max_entry_count=0.8), log_level='INFO')。这个问题怎么解决了？

Sep 12 '24 14:09 Fly2flies