BigDL-2.x icon indicating copy to clipboard operation
BigDL-2.x copied to clipboard

Orca Pytorch on yarn: libpython3.7m.so from LD_PRELOAD cannot be preloaded

Open jing-xu opened this issue 5 years ago • 3 comments

When I ran cifar10, python cifar10/cifar10/cifar10.py --cluster_mode yarn, on Almaren-Node-002 , I met with this error after stage8.

Stack trace: ExitCodeException exitCode=134: ERROR: ld.so: object 'python_env/lib/libpython3.7m.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

The LD_PRELOAD was added here: https://github.com/intel-analytics/analytics-zoo/pull/2828/files

jing-xu avatar Oct 29 '20 10:10 jing-xu

Almaren cluster is ubuntu 14.04...It's not supported.

qiuxin2012 avatar Nov 02 '20 02:11 qiuxin2012

I met the same issue when set cluster_mode="yarn-cluster". My system is CentOS 7.9 and the code can work on cluster_mode="yarn-client". Do you have any idea to solve this?

bestfleer avatar Jul 13 '22 08:07 bestfleer

I met the same issue when set cluster_mode="yarn-cluster". My system is CentOS 7.9 and the code can work on cluster_mode="yarn-client". Do you have any idea to solve this?

I just try on our CDH cluster with BigDL 2.0, OS is centos 7.6. cluster_mode="yarn-cluster" works fine. You can follow this guide to set up your environment https://bigdl.readthedocs.io/en/latest/doc/Orca/QuickStart/orca-pytorch-quickstart.html#step-0-prepare-environment. Maybe you can try to upgrade jep from 3.9.0 to 3.9.1.

qiuxin2012 avatar Sep 06 '22 07:09 qiuxin2012