justdoit comments

Results 13 comments of


                                            justdoit

[Bug] 访问一段时间后服务卡死/无响应

我也遇到了相同的问题，使用lmdeploy v0.5.0 运行 InternVL-v15-chat 个人怀疑是：线程中异步的ImageEncoder前向和 llm中的 forward导致的 cuda Launch 卡死。

[Bug] 访问一段时间后服务卡死/无响应

用了0.5.3版本尝试 TP=2 推理 InternVL-v15-chat features 这里返回是Nan，最终的输出logits 都是0，context无输出。TP=1 没有这个问题，切输出结果正常。 ![image](https://github.com/user-attachments/assets/89f63ab1-5be6-4baf-a4a5-e5a37562faa0)

> @coolhok > > 使用pipeline接口，创建完pipeline之后，(with tp > 1)，直接调用下面的语句会有问题么？ > > ``` > from lmdeploy.vl import load_image > im = load_image('image path') > pipe.vl_encoder.forward([im]) > ``` # code ``` from lmdeploy...

[Bug] 访问一段时间后服务卡死/无响应

@irexyc use lmdeploy v0.5.3。TP=1 Running the generated data may also result in HTTP 499 freezing error,Simultaneous error stack _forward_loop ：Add more threads to save operations？ ``` [2024-08-10 19:25:23] 2024-08-10 19:25:23,109...

[Feature] Speculative Decoding

> EAGLE also has plans to support open source in the future. Can you reveal the schedule。Or share the development of the branch together，thanks!!

[Bug] v0.5.0 crashes with CUDA OOM error while v0.4.2 does not (in exactly the same scenario - 30 concurrent requests to LLama2 70B)

I also encountered the same problem。when prefill n_token > 2048。 After making the following modifications, I can now work normally src/turbomind/utils/allocator.h ![image](https://github.com/InternLM/lmdeploy/assets/24875266/0cdfd9b9-7224-4d3a-a742-754f37e4a3fb)

[Bug] v0.5.0 crashes with CUDA OOM error while v0.4.2 does not (in exactly the same scenario - 30 concurrent requests to LLama2 70B)

> > I also encountered the same problem。when prefill n_token > 2048。 After making the following modifications, I can now work normally > > src/turbomind/utils/allocator.h > > ![image](https://private-user-images.githubusercontent.com/24875266/346410872-0cdfd9b9-7224-4d3a-a742-754f37e4a3fb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA0MDY0MDMsIm5iZiI6MTcyMDQwNjEwMywicGF0aCI6Ii8yNDg3NTI2Ni8zNDY0MTA4NzItMGNkZmQ5YjktNzIyNC00ZDNhLWE3NDItNzU0ZjM3ZTRhM2ZiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA4VDAyMzUwM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM3ZDE4NGUyYjgyMDY1OTZiNjU0YjdmODUxOGEyNGFlYTlkYmIxYjhkZTE0MjMyMTY3ZTAzNzQ5ZGE4ZDQyYmMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.oiI06RLk0vhV6FtU-sCvhsxA2sJ8VQ3k-ZpKrBIIFOM) > >...

justdoit

[Bug] 访问一段时间后服务卡死/无响应

[Bug] 访问一段时间后服务卡死/无响应

[Bug] 访问一段时间后服务卡死/无响应

[Bug] 访问一段时间后服务卡死/无响应

[Feature] Speculative Decoding

[Bug] v0.5.0 crashes with CUDA OOM error while v0.4.2 does not (in exactly the same scenario - 30 concurrent requests to LLama2 70B)

[Bug] v0.5.0 crashes with CUDA OOM error while v0.4.2 does not (in exactly the same scenario - 30 concurrent requests to LLama2 70B)

Speculative decoding with lookahead

Speculative decoding with lookahead

[Bug] sglang run for few hours, it will stop returning valid response