lmdeploy batch_infer疑似死锁卡住[Bug]

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

使用pipline 进行batch_infer，进行到一部分就卡住不动了，一直维持了一晚上也不变，只能kill掉了

Convert to turbomind format: 97%|█████████▋| 58/60 [04:25<00:07, 3.61s/it]

Convert to turbomind format: 98%|█████████▊| 59/60 [05:09<00:15, 15.64s/it] Convert to turbomind format: 100%|██████████| 60/60 [05:12<00:00, 12.03s/it]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

0%| | 0/194850 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Convert to turbomind format: 98%|█████████▊| 59/60 [02:23<00:02, 2.94s/it] Convert to turbomind format: 100%|██████████| 60/60 [02:24<00:00, 2.43s/it]

0%| | 0/44847 [00:00<?, ?it/s]

0%| | 31/44847 [00:40<16:11:04, 1.30s/it]

0%| | 63/44847 [01:21<16:01:41, 1.29s/it]

0%| | 95/44847 [01:57<15:11:50, 1.22s/it]

Reproduction

rt

Environment

lmdeploy=0.5.3 A100*2

Error traceback

No response

Aug 09 '24 02:08 hitzhu

信息不是很足。最好能提供debug日志。

开启的方式为，首先 export TM_DEBUG_LEVEL=DEBUG，

然后创建pipeline (log_level='DEBUG') 或者 server (--log-level DEBUG) 的时候将日志等级设置为 DEBUG

Aug 09 '24 07:08 irexyc

信息不是很足。最好能提供debug日志。

开启的方式为，首先 export TM_DEBUG_LEVEL=DEBUG，

然后创建pipeline (log_level='DEBUG') 或者 server (--log-level DEBUG) 的时候将日志等级设置为 DEBUG

这是卡住时的信息输出： 2024-08-09 16:01:36,049 - lmdeploy - [37mINFO[0m - session_id=5, history_tokens=0, input_tokens=2633, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True 2024-08-09 16:01:36,049 - lmdeploy - [37mINFO[0m - Register stream callback for 5 2024-08-09 16:01:37,299 - lmdeploy - [37mINFO[0m - ImageEncoder forward 1 images, cost 1.251s 2024-08-09 16:01:37,300 - lmdeploy - [37mINFO[0m - ImageEncoder done 1 images, left 0 images. 2024-08-09 16:01:37,300 - lmdeploy - [37mINFO[0m - ImageEncoder received 1 images, left 1 images. 2024-08-09 16:01:37,300 - lmdeploy - [37mINFO[0m - ImageEncoder process 1 images, left 0 images. 2024-08-09 16:01:37,300 - lmdeploy - [37mINFO[0m - prompt='<|im_start|>system\n我是书生·万象，英文名是InternVL，是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。<|im_end|><|im_start|>user\n<IMAGE_TOKEN>\n请描述一下这张图片,请特别关注图片中商品的品牌和货号信息,其他信息可以忽略<|im_end|><|im_start|>assistant\n', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.8, repetition_penalty=1.0, ignore_eos=False, random_seed=9665014037230033591, stop_words=None, bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[6, 1328, 144, 59568, 8536, 59920, 59661, 60389, 59941, 60207, 101, 59568, 14877, 59867, 59626, 3187, 1318, 53581, 101, 59568, 13326, 4510, 13992, 15290, 59803, 20300, 3563, 3618, 5777, 4419, 20965, 60131, 60106, 59647, 6961, 11443, 102, 7, 6, 2942, 144, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 59568, 60146, 12029, 3932, 27530, 10698, 97, 59568, 60146, 3512, 4305, 10698, 59642, 7873, 50630, 59652, 60443, 60100, 2530, 97, 59568, 2711, 2530, 1229, 20550, 7, 6, 14135, 144], adapter_name=None. 2024-08-09 16:01:37,301 - lmdeploy - [37mINFO[0m - session_id=6, history_tokens=0, input_tokens=2633, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True 2024-08-09 16:01:37,301 - lmdeploy - [37mINFO[0m - Register stream callback for 6

Aug 09 '24 08:08 hitzhu

信息不是很足。最好能提供debug日志。

开启的方式为，首先 export TM_DEBUG_LEVEL=DEBUG，

然后创建pipeline (log_level='DEBUG') 或者 server (--log-level DEBUG) 的时候将日志等级设置为 DEBUG

turbomind::LlamaAttentionWeight<__nv_bfloat16>] [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_query [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layer_id [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: dc_batch_size [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: pf_batch_size [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_q_len [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_k_len [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_q_len [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_k_len [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_cu_q_len [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: h_cu_k_len [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = bool] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: rope_theta [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = float] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: block_ptrs [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = void*] start [TM][DEBUG] getPtr with type x, but data type is: u8 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: cu_block_counts [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: input_query [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: hidden_features [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start [TM][DEBUG] void turbomind::UnifiedAttentionLayer<T>::allocateBuffer(size_t, size_t, size_t, const WeightType*) [with T = __nv_bfloat16; size_t = long unsigned int; turbomind::UnifiedAttentionLayer<T>::WeightType = turbomind::LlamaAttentionWeight<__nv_bfloat16>] [TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Reuse original buffer 0xc558be200 with size 24321024 and do nothing for reMalloc. [TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Reuse original buffer 0xc57a94e00 with size 18916352 and do nothing for reMalloc. [TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Reuse original buffer 0xc58c9f200 with size 5523456 and do nothing for reMalloc. [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: lora_mask [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] getPtr with type i4, but data type is: x [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void*, int, const void*, int, void*, int, float, float) [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void*, int, const void*, int, void*, int, float, float) [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] void turbomind::ftNcclAllReduceSum(const T*, T*, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] start [TM][DEBUG] ncclDataType_t turbomind::getNcclDataType() [with T = __nv_bfloat16] start [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void*, int, const void*, int, void*, int, float, float) [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] void turbomind::ftNcclAllReduceSum(const T*, T*, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] start [TM][DEBUG] ncclDataType_t turbomind::getNcclDataType() [with T = __nv_bfloat16] start [TM][DEBUG] void turbomind::ftNcclAllReduceSum(const T*, T*, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] stop [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_attention_layer.cc:319 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_attention_layer.cc:325 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_decoder.cc:184 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_input [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layer_id [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_output [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: lora_mask [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_input [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layer_id [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Reuse original buffer 0x341a63b400 with size 54046720 and do nothing for reMalloc. [TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Reuse original buffer 0x341d9c6400 with size 54046720 and do nothing for reMalloc. [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_input [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start [TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key ffn_output [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_output [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: lora_mask [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] getPtr with type i4, but data type is: x [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void*, int, const void*, int, void*, int, float, float) [TM][DEBUG] void turbomind::ftNcclAllReduceSum(const T*, T*, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] stop [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_attention_layer.cc:319 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_attention_layer.cc:325 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/unified_decoder.cc:184 [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_input [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layer_id [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_output [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: lora_mask [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_input [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: layer_id [TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start [TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start [TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Reuse original buffer 0xc5a63b400 with size 54046720 and do nothing for reMalloc. [TM][DEBUG] void* turbomind::IAllocator::reMalloc(T*, size_t, bool, bool) [with T = __nv_bfloat16; size_t = long unsigned int] [TM][DEBUG] Reuse original buffer 0xc5d9c6400 with size 54046720 and do nothing for reMalloc. [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_input [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start [TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key ffn_output [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ffn_output [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = __nv_bfloat16] start [TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: lora_mask [TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start [TM][DEBUG] getPtr with type i4, but data type is: x [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void*, int, const void*, int, void*, int, float, float) [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void*, int, const void*, int, void*, int, float, float) [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] void turbomind::invokeGenericActivation(T*, const BT*, const T*, const BT*, const int*, const T*, int, int, int, const float*, const float*, const int*, int, cudaStream_t) [with Activation = turbomind::SiluActivation; T = __nv_bfloat16; BT = __nv_bfloat16; cudaStream_t = CUstream_st*] [TM][DEBUG] invokeGenericActivation 2639 10240 0 [TM][DEBUG] 52780 512 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/activation_kernels.cu:281 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/activation_kernels.cu:295 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaFfnLayer.cc:70 [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void*, int, const void*, int, void*, int, float, float) [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] void turbomind::cublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, const void*, int, const void*, int, void*, int, float, float) [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] void turbomind::ftNcclAllReduceSum(const T*, T*, int, turbomind::NcclParam, cudaStream_t) [with T = __nv_bfloat16; cudaStream_t = CUstream_st*] start [TM][DEBUG] ncclDataType_t turbomind::getNcclDataType() [with T = __nv_bfloat16] start [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/utils/cublasMMWrapper.cc:326 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaLinear.h:103 [TM][DEBUG] void turbomind::invokeGenericActivation(T*, const BT*, const T*, const BT*, const int*, const T*, int, int, int, const float*, const float*, const int*, int, cudaStream_t) [with Activation = turbomind::SiluActivation; T = __nv_bfloat16; BT = __nv_bfloat16; cudaStream_t = CUstream_st*] [TM][DEBUG] invokeGenericActivation 2639 10240 0 [TM][DEBUG] 52780 512 [TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/activation_kernels.cu:281

Aug 09 '24 08:08 hitzhu

[TM][DEBUG] invokeGenericActivation 2639 10240 0
[TM][DEBUG] 52780 512
[TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/activation_kernels.cu:281

这下面没有了么？

Aug 09 '24 08:08 irexyc

[TM][DEBUG] invokeGenericActivation 2639 10240 0
[TM][DEBUG] 52780 512
[TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/activation_kernels.cu:281

这下面没有了么？

对，一只卡在这

Aug 09 '24 08:08 hitzhu

@lzhangzz 有什么思路么

Aug 09 '24 08:08 irexyc

一样的问题