FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

[NPU] ERNIE 4.5 support

Open starmountain1997 opened this issue 5 months ago • 6 comments

环境配置

基础环境配置

镜像启动

建议使用镜像安装,当然你也可以在裸机上安装。

首先根据自己的系统架构拉取镜像:

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-x86_64-gcc84 # X86 架构

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84 # ARM 架构

启动镜像:

docker run -it --name ${NAME} -v /home/guozr:/home/guozr \
    --privileged --shm-size=128G -w=/home/guozr \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    --net host \
    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
    e6acd904bbcf /bin/bash

安装高版本 CANN

镜像内的 CANN 套件较老,需要重新安装 CANN Toolkit、CANN Kernels 和 NNAL,版本>=8.1.RC1,请注意,三个软件的版本需配套,推荐使用 8.2.RC1 版本。请正确选择 CPU 架构,CANN kernels 是分硬件的,请注意选择。下载好后按下面顺序安装:

yes | toolkit.run --install
yes | kernels.run --install
yes | nnal.run --install

配置环境变量

运行前请配置下列环境变量:

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/atb/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=1

另外默认显存分配机制为 naive_best_fit 可选择配置 Paddle 显存分配机制为 auto_growth 以随着真实数据需要再占用内存/显存,但内存/显存可能会产生碎片,详见

目前由于未知原因,不将显存分配机制设为 auto_growth会爆显存,因此也请设置下面的环境变量:

export FLAGS_allocator_strategy=auto_growth

Python 环境配置

安装 Paddle

可使用如下命令安装(更高版本的 paddlepaddlepaddleformers 有冲突,因此这里建议安装 3.1 版本):

# 先安装飞桨 CPU 安装包
pip install paddlepaddle==3.1
# 再安装飞桨 NPU 插件包
pip install paddle-custom-npu -i https://www.paddlepaddle.org.cn/packages/stable/npu

详见昇腾 NPU 安装说明

安装三方库

编译 PaddleCustomDevice 之前,需要安装三方库 spdlogjson

# 安装 spglog
git clone https://github.com/gabime/spdlog.git
cd spdlog
mkdir build && cd build
cmake ..
make -j$(nproc)
make install

# 安装 json
git clone https://github.com/nlohmann/json.git
cd json
mkdir build && cd build
cmake ..
make -j$(nproc)
make install

安装 PaddleCustomDevice

git clone https://github.com/PaddlePaddle/PaddleCustomDevice.git
cd PaddleCustomDevice/backends/npu
bash tools/compile.sh

完成编译后执行下面的命令安装:

pip install build/dist/paddle_custom_npu-*.whl --force-reinstall

手动安装这个 PR

git clone https://github.com/llliiilil/PaddleCustomDevicetmp.git miPaddleCustomDevice
cd miPaddleCustomDevice/backends/npu
bash tools/compile.sh
pip install build/dist/paddle_custom_npu-*.whl --force-reinstall


source tools/set_env.sh
cd opp/ascendc_custom_ops/build/
bash build_ops.sh
cd custom_project/build_out/
./custom_opp*.run
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/:${LD_LIBRARY_PATH}

如后续报错 please make sure you registered your op first and try again,请在手动安装后回去再覆盖安装一下主线版本 PaddlePaddle/PaddleCustomDevice 中生成的 whl

安装 PaddleNLP

从源码克隆:

git clone https://github.com/PaddlePaddle/PaddleNLP.git

csrc/npu 目录下按照 README.md 安装:

python setup.py build bdist_wheel
pip install dist/paddlenlp_ops*.whl

编译 FastDeploy

bash build.sh

运行时可能会报错:

ModuleNotFoundError: No module named 'distutils.dir_util'

可以修改 /usr/local/lib/python3.10/dist-packages/paddleformers/utils/pdc_sdk.py 22 行的 from distutils.dir_util import copy_tree 为:

from shutil import copytree as copy_tree

运行前需把对应的 FastDeploy 目录添加到 PYTHONPATH

export PYTHONPATH="/work/FastDeploy":${PYTHONPATH}
export LD_LIBRARY_PATH=/usr/local/Ascend/npt/lib:$LD_LIBRARY_PATH

如果遇到 libgomp cannot allocate memory in static TLS block 错误,可以按如下方法解决:

export LD_PRELOAD=$LD_PRELOAD:/usr/local/lib/python3.10/dist-packages/scikit_learn.libs/libgomp-{一串数字,根据你实际情况决定}.so.1.0.0

如果遇到循环导入问题,且不运行多模态模型,可以临时卸载 opencv。另外请注意,目前对 numpy 2.0 支持不佳,因此在最后请强制安装 numpy 1.26.4 版本:

pip uninstall opencv-python
pip install numpy==1.26.4

如果遇到:

  File "/home/guozr/CODE/FastDeploy/fastdeploy/utils.py", line 443, in get_host_ip
    ip = socket.gethostbyname(socket.gethostname())
socket.gaierror: [Errno -2] Name or service not known

先查询:

hostname

然后在 /etc/hosts 加上上面查询到的 hostname:

127.0.0.1   hostname-mbqbc.foreman.pxe localhost

starmountain1997 avatar Aug 14 '25 07:08 starmountain1997

Thanks for your contribution!

paddle-bot[bot] avatar Aug 14 '25 07:08 paddle-bot[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Aug 14 '25 08:08 CLAassistant

您好,麻烦请教下,启动ernie4.5-21b-a3b模型报错: FastDeploy/fastdeploy/model_executor/ops/npu/sparse_moe.py", line 6, in from paddlenlp_ops import sparse_moe ImportError: cannot import name 'sparse_moe' from 'paddlenlp_ops' (/usr/local/lib/python3.10/dist-packages/paddlenlp_ops/init.py)

nd-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.dlsym aclnnSubGetWorkspaceSize from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSubGetWorkspaceSize.dlsym aclnnSub from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.

Cndbk avatar Nov 12 '25 09:11 Cndbk

您好,麻烦请教下,启动ernie4.5-21b-a3b模型报错: FastDeploy/fastdeploy/model_executor/ops/npu/sparse_moe.py", line 6, in from paddlenlp_ops import sparse_moe ImportError: cannot import name 'sparse_moe' from 'paddlenlp_ops' (/usr/local/lib/python3.10/dist-packages/paddlenlp_ops/init.py)

nd-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.dlsym aclnnSubGetWorkspaceSize from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSubGetWorkspaceSize.dlsym aclnnSub from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.

paddlecustomdevice都装了嘛

starmountain1997 avatar Nov 28 '25 08:11 starmountain1997

嗯嗯,按照步骤构建了,但是还是提示缺失sparse_moe,麻烦问下paddlenlp版本是有指定的分支吗

Cndbk avatar Nov 28 '25 08:11 Cndbk

NPU630版本适配可用,暂时未合入。待跟进

TBD1 avatar Dec 01 '25 03:12 TBD1