[NPU] ERNIE 4.5 support
环境配置
基础环境配置
镜像启动
建议使用镜像安装,当然你也可以在裸机上安装。
首先根据自己的系统架构拉取镜像:
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-x86_64-gcc84 # X86 架构
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84 # ARM 架构
启动镜像:
docker run -it --name ${NAME} -v /home/guozr:/home/guozr \
--privileged --shm-size=128G -w=/home/guozr \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
--net host \
-e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
e6acd904bbcf /bin/bash
安装高版本 CANN
镜像内的 CANN 套件较老,需要重新安装 CANN Toolkit、CANN Kernels 和 NNAL,版本>=8.1.RC1,请注意,三个软件的版本需配套,推荐使用 8.2.RC1 版本。请正确选择 CPU 架构,CANN kernels 是分硬件的,请注意选择。下载好后按下面顺序安装:
yes | toolkit.run --install
yes | kernels.run --install
yes | nnal.run --install
配置环境变量
运行前请配置下列环境变量:
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/atb/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=1
另外默认显存分配机制为 naive_best_fit 可选择配置 Paddle 显存分配机制为 auto_growth 以随着真实数据需要再占用内存/显存,但内存/显存可能会产生碎片,详见。
目前由于未知原因,不将显存分配机制设为 auto_growth会爆显存,因此也请设置下面的环境变量:
export FLAGS_allocator_strategy=auto_growth
Python 环境配置
安装 Paddle
可使用如下命令安装(更高版本的 paddlepaddle 和 paddleformers 有冲突,因此这里建议安装 3.1 版本):
# 先安装飞桨 CPU 安装包
pip install paddlepaddle==3.1
# 再安装飞桨 NPU 插件包
pip install paddle-custom-npu -i https://www.paddlepaddle.org.cn/packages/stable/npu
详见昇腾 NPU 安装说明。
安装三方库
编译 PaddleCustomDevice 之前,需要安装三方库 spdlog 和 json:
# 安装 spglog
git clone https://github.com/gabime/spdlog.git
cd spdlog
mkdir build && cd build
cmake ..
make -j$(nproc)
make install
# 安装 json
git clone https://github.com/nlohmann/json.git
cd json
mkdir build && cd build
cmake ..
make -j$(nproc)
make install
安装 PaddleCustomDevice
git clone https://github.com/PaddlePaddle/PaddleCustomDevice.git
cd PaddleCustomDevice/backends/npu
bash tools/compile.sh
完成编译后执行下面的命令安装:
pip install build/dist/paddle_custom_npu-*.whl --force-reinstall
手动安装这个 PR:
git clone https://github.com/llliiilil/PaddleCustomDevicetmp.git miPaddleCustomDevice
cd miPaddleCustomDevice/backends/npu
bash tools/compile.sh
pip install build/dist/paddle_custom_npu-*.whl --force-reinstall
source tools/set_env.sh
cd opp/ascendc_custom_ops/build/
bash build_ops.sh
cd custom_project/build_out/
./custom_opp*.run
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/:${LD_LIBRARY_PATH}
如后续报错 please make sure you registered your op first and try again,请在手动安装后回去再覆盖安装一下主线版本 PaddlePaddle/PaddleCustomDevice 中生成的 whl。
安装 PaddleNLP
从源码克隆:
git clone https://github.com/PaddlePaddle/PaddleNLP.git
到 csrc/npu 目录下按照 README.md 安装:
python setup.py build bdist_wheel
pip install dist/paddlenlp_ops*.whl
编译 FastDeploy
bash build.sh
运行时可能会报错:
ModuleNotFoundError: No module named 'distutils.dir_util'
可以修改 /usr/local/lib/python3.10/dist-packages/paddleformers/utils/pdc_sdk.py 22 行的 from distutils.dir_util import copy_tree 为:
from shutil import copytree as copy_tree
运行前需把对应的 FastDeploy 目录添加到 PYTHONPATH:
export PYTHONPATH="/work/FastDeploy":${PYTHONPATH}
export LD_LIBRARY_PATH=/usr/local/Ascend/npt/lib:$LD_LIBRARY_PATH
如果遇到 libgomp cannot allocate memory in static TLS block 错误,可以按如下方法解决:
export LD_PRELOAD=$LD_PRELOAD:/usr/local/lib/python3.10/dist-packages/scikit_learn.libs/libgomp-{一串数字,根据你实际情况决定}.so.1.0.0
如果遇到循环导入问题,且不运行多模态模型,可以临时卸载 opencv。另外请注意,目前对 numpy 2.0 支持不佳,因此在最后请强制安装 numpy 1.26.4 版本:
pip uninstall opencv-python
pip install numpy==1.26.4
如果遇到:
File "/home/guozr/CODE/FastDeploy/fastdeploy/utils.py", line 443, in get_host_ip
ip = socket.gethostbyname(socket.gethostname())
socket.gaierror: [Errno -2] Name or service not known
先查询:
hostname
然后在 /etc/hosts 加上上面查询到的 hostname:
127.0.0.1 hostname-mbqbc.foreman.pxe localhost
Thanks for your contribution!
您好,麻烦请教下,启动ernie4.5-21b-a3b模型报错:
FastDeploy/fastdeploy/model_executor/ops/npu/sparse_moe.py", line 6, in
nd-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.dlsym aclnnSubGetWorkspaceSize from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSubGetWorkspaceSize.dlsym aclnnSub from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.
您好,麻烦请教下,启动ernie4.5-21b-a3b模型报错: FastDeploy/fastdeploy/model_executor/ops/npu/sparse_moe.py", line 6, in from paddlenlp_ops import sparse_moe ImportError: cannot import name 'sparse_moe' from 'paddlenlp_ops' (/usr/local/lib/python3.10/dist-packages/paddlenlp_ops/init.py)
nd-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.dlsym aclnnSubGetWorkspaceSize from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSubGetWorkspaceSize.dlsym aclnnSub from libcust_opapi.so failed, error:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/libcust_opapi.so: undefined symbol: aclnnSub.
paddlecustomdevice都装了嘛
嗯嗯,按照步骤构建了,但是还是提示缺失sparse_moe,麻烦问下paddlenlp版本是有指定的分支吗
NPU630版本适配可用,暂时未合入。待跟进