_load_library fails on RHEL distributions due to `platlib` being different from `purelib`
I an having some issue with tensorflow-io 0.24.0 when installing it in Docker.
My setup is as follows: to test and build my project, I am using tox, which also takes care of installing the project dependencies from pip (including tensorflow-io) in a dedicated virtual environment.
When I run tox on my mac, tensorflow-io is installed correctly in the virtual environment and all tests passed.
As part of our build system, I need however to do the test and build in a dedicated docker image. In this scenario, after tox initializes the virtual environment and installs the dependencies, the tests fail when importing tensorflow-io with the following error:
.../.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: cannot open shared object file: No such file or directory'
I did some investigation, and it seems that the underlying issue is that, while the tensorflow-io files are installed under .../.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io, the shared object is instead installed under lib64 (i.e. it can be found at .../.tox/py38-unit-test-build/lib64/python3.8/site-packages/tensorflow_io/python/ops/libtensorflow_io.so). In the tox environment on my mac instead, the shared object is at the expected lib location.
Any clues?
Some more details:
- This happens on a container based on an Oracle Linux image, which leads to the separate lib/lib64 namespaces (not sure that it is controllable whether shared objects end up in lib64 or lib, but I believe it is OK for them to be in lib64, since many other packages do the same without problems).
- Below the full error stack. It is clear that tensorflow-io fails because it is looking for the shared object only under
lib. I believe that, in order to work properly with RH-based distributions, it should look for the shared objects also underlib64(usingTFIO_DATAPATHis obviously not an option).
.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io/python/ops/parquet_dataset_ops.py:30: in __init__
components, shapes, dtypes = core_ops.io_parquet_readable_info(
.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io/python/ops/__init__.py:88: in __getattr__
return getattr(self._load(), attrb)
.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io/python/ops/__init__.py:84: in _load
self._mod = _load_library(self._library)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
filename = 'libtensorflow_io.so', lib = 'op'
def _load_library(filename, lib="op"):
"""_load_library"""
f = inspect.getfile(sys._getframe(1)) # pylint: disable=protected-access
# Construct filename
f = os.path.join(os.path.dirname(f), filename)
filenames = [f]
# Add datapath to load if en var is set, used for running tests where shared
# libraries are built in a different path
datapath = os.environ.get("TFIO_DATAPATH")
if datapath is not None:
# Build filename from:
# `datapath` + `tensorflow_io` + `package_name` + `relpath_to_library`
rootpath = os.path.dirname(sys.modules["tensorflow_io"].__file__)
filename = sys.modules[__name__].__file__
f = os.path.join(
datapath,
"tensorflow_io",
os.path.relpath(os.path.dirname(filename), rootpath),
os.path.relpath(f, os.path.dirname(filename)),
)
filenames.append(f)
# Function to load the library, return True if file system library is loaded
if lib == "op":
load_fn = tf.load_op_library
elif lib == "dependency":
load_fn = lambda f: ctypes.CDLL(f, mode=ctypes.RTLD_GLOBAL)
elif lib == "fs":
load_fn = lambda f: tf.experimental.register_filesystem_plugin(f) is None
else:
load_fn = lambda f: tf.compat.v1.load_file_system_library(f) is None
# Try to load all paths for file, fail if none succeed
errs = []
for f in filenames:
try:
l = load_fn(f)
if l is not None:
return l
except (tf.errors.NotFoundError, OSError) as e:
errs.append(str(e))
> raise NotImplementedError(
"unable to open file: "
+ f"{filename}, from paths: {filenames}\ncaused by: {errs}"
)
E NotImplementedError: unable to open file: libtensorflow_io.so, from paths: ['.../.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
E caused by: ['.../.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: cannot open shared object file: No such file or directory']
Anyone? As I see it, there are only 2 possible explanations:
- tensorflow-io's approach for loading the shared objects is not working for lib/lib64 layouts and should be fixed;
- It's me who is doing something wrong and I should install tensorflow-io differently.
I would appreciate some feedback to know how I should deal with this. Thanks!
I have revisited this issue now and I have isolated its root cause.
The problem is unrelated from docker: it is due to the fact that, on RHEL-distributions, platlib is different from purelib, see for instance https://stackoverflow.com/a/27882460/7414397 and https://github.com/pypa/virtualenv/issues/1751.
The issue in _load_library is that the logic for identifying the path to the shared library does not take that possibility into account, and instead it implicitly assumes that platlib and purelib are the same.
This is definitely a bug that needs to be fixed in tensorflow-io, otherwise it won't work on RHEL distributions. Can someone assign this please? @yongtang