lmdeploy
lmdeploy copied to clipboard
[Bug] 使用Turbomind启动服务,当并发过多的时候会导致OOM,然后服务直接自杀了,希望能改进成OOM之前判断是否能处理,不能处理就一个个处理,不要服务直接自杀,非常感谢
Checklist
- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
- [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
Failed to find a feasible kernel in the cache, will dispatch by heuristic. [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/core/allocator.cc:49
Reproduction
Failed to find a feasible kernel in the cache, will dispatch by heuristic. [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/core/allocator.cc:49
Environment
0.8.0
Error traceback