yaofengchen
yaofengchen
> > 310P刚刚跑通单卡,多卡问题我们分析下 > > 请问有跑通的镜像吗?我使用官方提供的`crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:latest ` 一直报错: > > ``` > 2025-04-10 11:40:39,181 - lmdeploy - INFO - async_engine.py:259 - input backend=pytorch, backend_config=PytorchEngineConfig(dtype='auto', tp=1, dp=1, dp_rank=0, session_len=None, max_batch_size=256, cache_max_entry_count=0.8,...
目前main分支已合入310P图模式,并且支持多卡
We have prepared a plan on 300I Duo, could you help us test it? @zer0py2c @wangyuanxiong-hub @qiling1345
> We have prepared a plan on 300I Duo, could you help us test it? @zer0py2c @wangyuanxiong-hub @qiling1345 You can try running it with https://github.com/yao-fengchen/dlinfer/tree/fix_attn and https://github.com/DeepLink-org/lmdeploy/tree/fix_attn on 300I Duo...
I fix this problem in the last commit on https://github.com/yao-fengchen/dlinfer/tree/fix_attn. However, thers is a limitation on 310P device that does not support gqa. So, on 310P device, only mha models,...
If there are no issues with this PR https://github.com/DeepLink-org/dlinfer/pull/80 when testing the MHA models on 310P device, we will include this feature in https://github.com/[DeepLink-org/dlinfer](https://github.com/DeepLink-org/dlinfer).
In theory, the MHA models support tensor parallelism.
We support glm4v-9b in version 0.6.3 of lmdeploy and version 0.1.2 of dlinfer(https://github.com/DeepLink-org/dlinfer), you can try again.
There is no available devices hanging in your container, you can use the command `mx-smi` to verify in your container. The is the command we use to create container for...
> > There is no available devices hanging in your container, you can use the command `mx-smi` to verify in your container. The is the command we use to create...