lmdeploy [Bug] lmdeploy + InternVL2-40B-AWQ hangs under a certain number of asynchronous requests

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

I used lmdeploy + InternVL2-40B-AWQ to inference a large number of videos by referring https://github.com/OpenGVLab/InternVL/issues/549, and after several hours, lmdeploy would hang with the GPU utilization at 0, but the process would not be terminated, and I couldn't view the thread stack information with pystack.

related issue: https://github.com/InternLM/lmdeploy/issues/2231

@irexyc, Could you take a look?

top:

log:

2024-09-27 07:36:38,423 - lmdeploy - [37mINFO[0m - ImageEncoder forward 1 images, cost 0.127s
2024-09-27 07:36:38,424 - lmdeploy - [37mINFO[0m - ImageEncoder process 1 images, left 1 images.
[TM][INFO] ------------------------- step = 2490 -------------------------
[TM][INFO] ------------------------- step = 2500 -------------------------
[TM][INFO] ------------------------- step = 2510 -------------------------
[TM][INFO] ------------------------- step = 2520 -------------------------
[TM][INFO] ------------------------- step = 2530 -------------------------
[TM][INFO] ------------------------- step = 2540 -------------------------
[TM][INFO] ------------------------- step = 2550 -------------------------
[TM][INFO] ------------------------- step = 2560 -------------------------
[TM][INFO] ------------------------- step = 2570 -------------------------
2024-09-27 07:36:39,222 - lmdeploy - [37mINFO[0m - ImageEncoder forward 1 images, cost 0.798s
2024-09-27 07:36:39,222 - lmdeploy - [37mINFO[0m - ImageEncoder process 1 images, left 0 images.
[TM][INFO] ------------------------- step = 2580 -------------------------
[TM][INFO] ------------------------- step = 2590 -------------------------
[TM][INFO] ------------------------- step = 2600 -------------------------
[TM][INFO] ------------------------- step = 2610 -------------------------
[TM][INFO] ------------------------- step = 2620 -------------------------
[TM][INFO] ------------------------- step = 2630 -------------------------
[TM][INFO] ------------------------- step = 2640 -------------------------
[TM][INFO] ------------------------- step = 2650 -------------------------
2024-09-27 07:36:40,011 - lmdeploy - [37mINFO[0m - ImageEncoder forward 1 images, cost 0.788s
2024-09-27 07:36:40,011 - lmdeploy - [37mINFO[0m - ImageEncoder done 8 images, left 0 images.
2024-09-27 07:36:40,011 - lmdeploy - [37mINFO[0m - ImageEncoder received 8 images, left 8 images.
2024-09-27 07:36:40,011 - lmdeploy - [37mINFO[0m - ImageEncoder process 1 images, left 7 images.
2024-09-27 07:36:40,013 - lmdeploy - [37mINFO[0m - prompt="<|im_start|>system\n你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型，英文名叫InternVL, 是一个有用无害的人工智能助手。<|im_end|><|im_start|>user\nFrame1: <img><IMAGE_TOKEN></img>\nFrame2: <img><IMAGE_TOKEN></img>\nFrame3: <img><IMAGE_TOKEN></img>\nFrame4: <img><IMAGE_TOKEN></img>\nFrame5: <img><IMAGE_TOKEN></img>\nFrame6: <img><IMAGE_TOKEN></img>\nFrame7: <img><IMAGE_TOKEN></img>\nFrame8: <img><IMAGE_TOKEN></img>\nDescribe this video in detail. Don't repeat.<|im_end|><|im_start|>assistant\n", gen_config=GenerationConfig(n=1, max_new_tokens=512, do_sample=False, top_p=1.0, top_k=1, min_p=0.0, temperature=1.0, repetition_penalty=1.0, ignore_eos=False, random_seed=10394281906232759988, stop_words=None, bad_words=None, stop_token_ids=[6, 7], bad_token_ids=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, response_format=None, logits_processors=None), prompt_token_id=[6, 1328, 144, 51943, 13326, 4510, 13992, 15290, 5777, 59977, 61094, 3540, 4419, 29361, 59661, 59691, 60131, 60106, 59647, 11443, 101, 59568, 14877, 30347, 3187, 1318, 53581, 97, 141, 4748, 32574, 59828, 60323, 53892, 4740, 44307, 102, 7, 6, 2942, 144, 38482, 78, 1759, 59568, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 38482, 79, 1759, 59568, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 38482, 80, 1759, 59568, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 38482, 81, 1759, 59568, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 38482, 82, 1759, 59568, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 38482, 83, 1759, 59568, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 38482, 84, 1759, 59568, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 38482, 85, 1759, 59568, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 70, 144, 3782, 15097, 719, 2744, 594, 4122, 98, 141, 9550, 59610, 59570, 12256, 98, 7, 6, 14135, 144], adapter_name=None.
2024-09-27 07:36:40,013 - lmdeploy - [37mINFO[0m - session_id=936, history_tokens=0, input_tokens=2162, max_new_tokens=512, seq_start=True, seq_end=True, step=0, prep=True
2024-09-27 07:36:40,013 - lmdeploy - [37mINFO[0m - Register stream callback for 936
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 936 received.

  0%|          | 934/248667 [1:49:01<481:58:11,  7.00s/it]

Reproduction

A minimal reproducible demo:

Click to expand

import logging
import gc
import os
from contextlib import contextmanager
from concurrent.futures import ThreadPoolExecutor, as_completed, TimeoutError

import pandas as pd
import torch
from PIL import Image
from tqdm import tqdm

import numpy as np
from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig, VisionConfig
from decord import VideoReader
from lmdeploy.vl.constants import IMAGE_TOKEN
from lmdeploy.vl.utils import encode_image_base64


logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)


def get_index(bound, fps, max_frame, first_idx=0, num_segments=32):
    if bound:
        start, end = bound[0], bound[1]
    else:
        start, end = -100000, 100000
    start_idx = max(first_idx, round(start * fps))
    end_idx = min(round(end * fps), max_frame)
    seg_size = float(end_idx - start_idx) / num_segments
    frame_indices = np.array(
        [
            int(start_idx + (seg_size / 2) + np.round(seg_size * idx))
            for idx in range(num_segments)
        ]
    )

    return frame_indices


@contextmanager
def video_reader(*args, **kwargs):
    """A context manager to solve the memory leak of decord.
    """
    vr = VideoReader(*args, **kwargs)
    try:
        yield vr
    finally:
        del vr
        gc.collect()


def load_video(video_path, bound=None, num_segments=32):
    # vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)
    with video_reader(video_path) as vr:
        max_frame = len(vr) - 1
        fps = float(vr.get_avg_fps())
        pixel_values_list, num_patches_list = [], []
        frame_indices = get_index(bound, fps, max_frame, first_idx=0, num_segments=num_segments)
        imgs = []
        for frame_index in frame_indices:
            img = Image.fromarray(vr[frame_index].asnumpy()).convert('RGB')
            imgs.append(img)
        
        return imgs


def query_single_video(pipe, video_path, prompt, gen_config, num_sampled_frames=8):
    try:
        print(video_path)
        video_frames = load_video(video_path, num_segments=num_sampled_frames)
        
        question = ""
        for i in range(len(video_frames)):
            question = question + f"Frame{i+1}: {IMAGE_TOKEN}\n"
        
        question += prompt
        
        content = [{"type": "text", "text": question}]
        for frame in video_frames:
            content.append(
                {
                    "type": "image_url",
                    "image_url": {"max_dynamic_patch": 1, "url": f"data:image/jpeg;base64,{encode_image_base64(frame)}"}
                }
            )
        message = [dict(role='user', content=content)]
        output = pipe(message, gen_config=gen_config)

        return video_path, output.text
    except Exception as e:
        logger.warning(f"Failed to recaption video: {video_path}. Error is: {e}.")


def query_videos(
    pipe,
    video_path_list,
    prompt,
    gen_config,
    saved_path,
    max_workers=1,
    video_folder="",
    video_path_column="video_path",
    caption_column="caption",
    num_sampled_frames=8,
    saved_freq=1
):
    result_list = []
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [
            executor.submit(
                query_single_video,
                pipe,
                video_path,
                prompt,
                gen_config,
                num_sampled_frames=num_sampled_frames
            )
            for video_path in video_path_list
        ]

        for f in tqdm(as_completed(futures), total=len(video_path_list)):
            try:
                result = f.result(timeout=180)
            except TimeoutError:
                logger.warning(f"query_single_video timeout.")
                result = None
            except Exception as e:
                logger.warning(f"query_single_video error is {e}.")
                result = None
            if result is None:
                continue
            video_path = os.path.relpath(result[0], video_folder) if video_folder != "" else result[0]
            result_list.append({video_path_column: video_path, caption_column: result[1]})

            if len(result_list) >= saved_freq:
                result_df = pd.DataFrame(result_list)
                if os.path.exists(saved_path):
                    saved_df = pd.read_json(saved_path, orient="records", lines=True)
                    result_df = pd.concat([saved_df, result_df], ignore_index=True)
                result_df.to_json(saved_path, orient="records", lines=True, force_ascii=False)

                logger.info(f"Save result to {saved_path}.")
                result_list.clear()


def main():
    video_metadata_path = "video_metadata_path.jsonl"
    video_path_column = "video_path"
    video_folder = "video_folder"
    saved_path = "saved_path.jsonl"

    saved_freq = 64

    model_path = "OpenGVLab/InternVL2-40B-AWQ"
    max_workers = 64
    input_prompt = "Describe this video in detail. Don\'t repeat."

    video_metadata_df = pd.read_json(video_metadata_path, lines=True)
    video_path_list = video_metadata_df["video_path"].tolist()
    video_path_list = [os.path.basename(video_path) for video_path in video_path_list]

    if os.path.exists(saved_path):
        saved_metadata_df = pd.read_json(saved_path, lines=True)
        saved_video_path_list = saved_metadata_df[video_path_column].tolist()
        video_path_list = list(set(video_path_list).difference(set(saved_video_path_list)))
        logger.info(
            f"Resume from {saved_path}: {len(saved_video_path_list)} processed and {len(video_path_list)} to be processed."
        )

    video_path_list = [os.path.join(video_folder, video_path) for video_path in video_path_list]

    # Initialize the lmdeploy inference pipeline.
    CUDA_VISIBLE_DEVICES = os.getenv("CUDA_VISIBLE_DEVICES", None)
    tensor_parallel_size = torch.cuda.device_count() if CUDA_VISIBLE_DEVICES is None else len(CUDA_VISIBLE_DEVICES.split(","))
    logger.info(f"Automatically set tensor_parallel_size={tensor_parallel_size} based on the available devices.")
    vision_config = VisionConfig(thread_safe=True)
    pipe = pipeline(
        model_path,
        backend_config=TurbomindEngineConfig(model_format='awq', session_len=8192, tp=tensor_parallel_size),
        vision_config=vision_config,
        log_level='INFO'
    )
    gen_config = GenerationConfig(top_k=1)

    logger.info("Start query videos...")
    query_videos(
        pipe,
        video_path_list,
        input_prompt,
        gen_config,
        saved_path,
        max_workers=max_workers,
        video_folder=video_folder,
        video_path_column=video_path_column,
        caption_column="caption",
        saved_freq=saved_freq
    )
            

if __name__ == "__main__":
    main()

Environment

Click to expand

sys.platform: linux
Python: 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA A800-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.3.1+cu118
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.18.1+cu118
LMDeploy: 0.6.0+
transformers: 4.44.2
gradio: Not Found
fastapi: 0.115.0
pydantic: 2.9.2
triton: 2.3.1
NVIDIA Topology: 
	[4mGPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	mlx5_0	mlx5_1	mlx5_2	mlx5_3	CPU Affinity	NUMA Affinity[0m
GPU0	 X 	NV8	NV8	NV8	NV8	NV8	NV8	NV8	PHB	PHB	PHB	PHB	0-103		N/A
GPU1	NV8	 X 	NV8	NV8	NV8	NV8	NV8	NV8	PHB	PHB	PHB	PHB	0-103		N/A
GPU2	NV8	NV8	 X 	NV8	NV8	NV8	NV8	NV8	PHB	PHB	PHB	PHB	0-103		N/A
GPU3	NV8	NV8	NV8	 X 	NV8	NV8	NV8	NV8	PHB	PHB	PHB	PHB	0-103		N/A
GPU4	NV8	NV8	NV8	NV8	 X 	NV8	NV8	NV8	PHB	PHB	PHB	PHB	0-103		N/A
GPU5	NV8	NV8	NV8	NV8	NV8	 X 	NV8	NV8	PHB	PHB	PHB	PHB	0-103		N/A
GPU6	NV8	NV8	NV8	NV8	NV8	NV8	 X 	NV8	PHB	PHB	PHB	PHB	0-103		N/A
GPU7	NV8	NV8	NV8	NV8	NV8	NV8	NV8	 X 	PHB	PHB	PHB	PHB	0-103		N/A
mlx5_0	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	 X 	PHB	PHB	PHB		
mlx5_1	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	 X 	PHB	PHB		
mlx5_2	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	 X 	PHB		
mlx5_3	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	 X 		

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

No response

Sep 27 '24 09:09 hkunzhe

The current log cannot determine the cause. Can you try the latest version 0.6.1? This version has improved exception handling and may help with this problem.

Sep 29 '24 03:09 irexyc

@irexyc Hi, the latest version 0.6.1 does not able to catch the exception similarly. In my experience, this bug is very easy to trigger, and it often causes the utilization of one GPU to drop to zero.

Sep 29 '24 12:09 hkunzhe

If the problem is easy to reproduce, can you share the entire log from the start of the program to the time the problem occurs? If you reduce cache_max_entry_count, does the problem still exist? I'm not sure if it's caused by insufficient cuda memory.

Sep 29 '24 17:09 irexyc

@irexyc I am using a pod with 8 * A100/A800 80G GPUs, and apart from the aforementioned program, no other programs are running.

Should I provide INFO or DEBUG Level log?

Sep 30 '24 02:09 hkunzhe

Is this issue resolved ? Is there any workaround ?

Dec 01 '24 08:12 shubham303

Is this issue resolved ? Is there any workaround ?

I switched to vllm.

Dec 03 '24 10:12 hkunzhe