Orca Pytorch on yarn: libpython3.7m.so from LD_PRELOAD cannot be preloaded
When I ran cifar10, python cifar10/cifar10/cifar10.py --cluster_mode yarn, on Almaren-Node-002 , I met with this error after stage8.
Stack trace: ExitCodeException exitCode=134: ERROR: ld.so: object 'python_env/lib/libpython3.7m.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
The LD_PRELOAD was added here: https://github.com/intel-analytics/analytics-zoo/pull/2828/files
Almaren cluster is ubuntu 14.04...It's not supported.
I met the same issue when set cluster_mode="yarn-cluster". My system is CentOS 7.9 and the code can work on cluster_mode="yarn-client". Do you have any idea to solve this?
I met the same issue when set cluster_mode="yarn-cluster". My system is CentOS 7.9 and the code can work on cluster_mode="yarn-client". Do you have any idea to solve this?
I just try on our CDH cluster with BigDL 2.0, OS is centos 7.6. cluster_mode="yarn-cluster" works fine. You can follow this guide to set up your environment https://bigdl.readthedocs.io/en/latest/doc/Orca/QuickStart/orca-pytorch-quickstart.html#step-0-prepare-environment. Maybe you can try to upgrade jep from 3.9.0 to 3.9.1.