justdoit
justdoit
我也遇到了相同的问题,使用lmdeploy v0.5.0 运行 InternVL-v15-chat 个人怀疑是:线程中异步的ImageEncoder前向 和 llm中的 forward导致的 cuda Launch 卡死。
用了0.5.3版本尝试 TP=2 推理 InternVL-v15-chat features 这里返回是Nan,最终的输出logits 都是0,context无输出。TP=1 没有这个问题,切输出结果正常。 
> @coolhok > > 使用pipeline接口,创建完pipeline之后,(with tp > 1),直接调用下面的语句会有问题么? > > ``` > from lmdeploy.vl import load_image > im = load_image('image path') > pipe.vl_encoder.forward([im]) > ``` # code ``` from lmdeploy...
@irexyc use lmdeploy v0.5.3。TP=1 Running the generated data may also result in HTTP 499 freezing error,Simultaneous error stack _forward_loop :Add more threads to save operations? ``` [2024-08-10 19:25:23] 2024-08-10 19:25:23,109...
> EAGLE also has plans to support open source in the future. Can you reveal the schedule。Or share the development of the branch together,thanks!!
I also encountered the same problem。when prefill n_token > 2048。 After making the following modifications, I can now work normally src/turbomind/utils/allocator.h 
> > I also encountered the same problem。when prefill n_token > 2048。 After making the following modifications, I can now work normally > > src/turbomind/utils/allocator.h > >  > >...
> mla I think mla attention not support tree mask,so this pr not work with Deepseek.
> I find this PR cannot run llama 8b with triton backend, the error is: > > 46 File "/data/peng/sglang/python/sglang/srt/speculative/lookahead_utils.py", line 160, in verify 47 batch.seq_lens_sum = batch.seq_lens.sum().item() 48 RuntimeError:...
> We used sglang in production and did not meet these problems. A few tips for increasing the stability > > 1. Try to disable custom all reduce by `--disable-custom-all-reduce`....