yezekun
yezekun
### Motivation 在ascend 310p上创建以下环境torch=2.3.1+cpu、torch-npu=2.3.1.post4 LMdeploy和dlinfer代码分别使用[DeepLink-org:support_310P](https://github.com/DeepLink-org/lmdeploy/tree/support_310P)和[yao-fengchen:support_310P](https://github.com/yao-fengchen/dlinfer/tree/support_310P) ascend 310p上进行多卡的推理任务,当tp=2时 ```python from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig, GenerationConfig if __name__ == "__main__": pipe = pipeline("/mnt/data/llm/Qwen1.5-7B-Chat/", backend_config=PytorchEngineConfig( tp=2, device_type="ascend", dtype='float16', eager_mode=True, cache_max_entry_count=0.5))...