justdoit

Results 13 comments of justdoit

我也遇到了相同的问题,使用lmdeploy v0.5.0 运行 InternVL-v15-chat 个人怀疑是:线程中异步的ImageEncoder前向 和 llm中的 forward导致的 cuda Launch 卡死。

用了0.5.3版本尝试 TP=2 推理 InternVL-v15-chat features 这里返回是Nan,最终的输出logits 都是0,context无输出。TP=1 没有这个问题,切输出结果正常。 ![image](https://github.com/user-attachments/assets/89f63ab1-5be6-4baf-a4a5-e5a37562faa0)

> @coolhok > > 使用pipeline接口,创建完pipeline之后,(with tp > 1),直接调用下面的语句会有问题么? > > ``` > from lmdeploy.vl import load_image > im = load_image('image path') > pipe.vl_encoder.forward([im]) > ``` # code ``` from lmdeploy...

@irexyc use lmdeploy v0.5.3。TP=1 Running the generated data may also result in HTTP 499 freezing error,Simultaneous error stack _forward_loop :Add more threads to save operations? ``` [2024-08-10 19:25:23] 2024-08-10 19:25:23,109...

> EAGLE also has plans to support open source in the future. Can you reveal the schedule。Or share the development of the branch together,thanks!!

I also encountered the same problem。when prefill n_token > 2048。 After making the following modifications, I can now work normally src/turbomind/utils/allocator.h ![image](https://github.com/InternLM/lmdeploy/assets/24875266/0cdfd9b9-7224-4d3a-a742-754f37e4a3fb)

> > I also encountered the same problem。when prefill n_token > 2048。 After making the following modifications, I can now work normally > > src/turbomind/utils/allocator.h > > ![image](https://private-user-images.githubusercontent.com/24875266/346410872-0cdfd9b9-7224-4d3a-a742-754f37e4a3fb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA0MDY0MDMsIm5iZiI6MTcyMDQwNjEwMywicGF0aCI6Ii8yNDg3NTI2Ni8zNDY0MTA4NzItMGNkZmQ5YjktNzIyNC00ZDNhLWE3NDItNzU0ZjM3ZTRhM2ZiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzA4VDAyMzUwM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM3ZDE4NGUyYjgyMDY1OTZiNjU0YjdmODUxOGEyNGFlYTlkYmIxYjhkZTE0MjMyMTY3ZTAzNzQ5ZGE4ZDQyYmMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.oiI06RLk0vhV6FtU-sCvhsxA2sJ8VQ3k-ZpKrBIIFOM) > >...

> mla I think mla attention not support tree mask,so this pr not work with Deepseek.

> I find this PR cannot run llama 8b with triton backend, the error is: > > 46 File "/data/peng/sglang/python/sglang/srt/speculative/lookahead_utils.py", line 160, in verify 47 batch.seq_lens_sum = batch.seq_lens.sum().item() 48 RuntimeError:...

> We used sglang in production and did not meet these problems. A few tips for increasing the stability > > 1. Try to disable custom all reduce by `--disable-custom-all-reduce`....