[Bug] 使用Turbomind启动服务，当并发过多的时候会导致OOM，然后服务直接自杀了，希望能改进成OOM之前判断是否能处理，不能处理就一个个处理，不要服务直接自杀，非常感谢

Open Gierry opened this issue 8 months ago • 0 comments

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Failed to find a feasible kernel in the cache, will dispatch by heuristic. [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/core/allocator.cc:49

Reproduction

Failed to find a feasible kernel in the cache, will dispatch by heuristic. [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/core/allocator.cc:49

Environment

0.8.0

Error traceback

May 14 '25 15:05 Gierry