怎么做 batch infer 来提高显卡利用率呢?

Open justStarG opened this issue 1 year ago • 1 comments

我使用 llava 72B 推理时，batch_size=1 显卡利用率不高，怎么可以使用大 batch 推理提高利用率呢？我没有使用 vllm，是直接使用的 swift 推理 https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/llava%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

May 20 '24 14:05 justStarG

目前原生pt不支持batch推理，还什么不考虑下vllm呢

May 21 '24 14:05 tastelikefeet

请问下 vllm怎么批量推理啊

Jun 27 '24 05:06 1028686314

https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/vLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E6%96%87%E6%A1%A3.md

Jul 08 '24 12:07 Jintao-Huang

感觉加速不是很明显啊

Sep 18 '24 03:09 FoolishMao