AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

[Issue]: msgpack symbols in migraphx conflict with the ones in rocroller

Open harakas opened this issue 5 months ago • 2 comments

Problem Description

I've run into a problem that msgpackc-cxx symbols in rocroller from TheRock build and the ones in libmigraphx.so are conflicting, and being shared libraries with the symbols public, one or the other will break. In my case rocroller was loaded first and I see following error when trying to load a compiled binary mxr file (with migraphx::load(path, options)):

MIGraphX Error: /workspace/AMDMIGraphX/src/value.cpp:371: at: Not an object for field: version
exception: Failed to call function

From LD_DEBUG=bindings:

      3813:	binding file /opt/rocm/lib/../lib/migraphx/lib/libmigraphx.so.2013000 [0] to /opt/rocm/lib/librocroller.so.1 [0]: normal symbol `_ZTIN7msgpack2v113size_overflowE'
      3813:	binding file /opt/rocm/lib/../lib/migraphx/lib/libmigraphx.so.2013000 [0] to /opt/rocm/lib/librocroller.so.1 [0]: normal symbol `_ZN7msgpack2v117bin_size_overflowD0Ev'
      3813:	binding file /opt/rocm/lib/../lib/migraphx/lib/libmigraphx.so.2013000 [0] to /opt/rocm/lib/librocroller.so.1 [0]: normal symbol `_ZN7msgpack2v111parse_errorD0Ev'
      3813:	binding file /opt/rocm/lib/../lib/migraphx/lib/libmigraphx.so.2013000 [0] to /opt/rocm/lib/librocroller.so.1 [0]: normal symbol `_ZTSN7msgpack2v117str_size_overflowE'
      3813:	binding file /opt/rocm/lib/../lib/migraphx/lib/libmigraphx.so.2013000 [0] to /opt/rocm/lib/librocroller.so.1 [0]: normal symbol `_ZN7msgpack2v110type_errorD0Ev'

A quick hacky fix for me was:

diff --git a/src/msgpack.cpp b/src/msgpack.cpp
index 2ee49071a..b497714c1 100644
--- a/src/msgpack.cpp
+++ b/src/msgpack.cpp
@@ -23,7 +23,10 @@
  */
 #include <migraphx/msgpack.hpp>
 #include <migraphx/serialize.hpp>
+
+#pragma GCC visibility push(hidden)
 #include <msgpack.hpp>
+#pragma GCC visibility pop
 
 namespace migraphx {
 inline namespace MIGRAPHX_INLINE_NS {
@@ -52,6 +55,7 @@ static void msgpack_chunk_for_each(Iterator start, Iterator last, F f)
 } // namespace MIGRAPHX_INLINE_NS
 } // namespace migraphx
 
+#pragma GCC visibility push(hidden)
 namespace msgpack {
 MSGPACK_API_VERSION_NAMESPACE(MSGPACK_DEFAULT_API_NS)
 {
@@ -206,6 +210,7 @@ MSGPACK_API_VERSION_NAMESPACE(MSGPACK_DEFAULT_API_NS)
     } // namespace adaptor
 } // MSGPACK_API_VERSION_NAMESPACE(MSGPACK_DEFAULT_API_NS)
 } // namespace msgpack
+#pragma GCC visibility pop
 
 namespace migraphx {
 inline namespace MIGRAPHX_INLINE_NS {

Things started to work after this patch was applied.

Aside from msgpack, I also see other symbol conflicts: libsqlite3 is pulled from several places (/opt/rocm/lib/rocm_sysdeps/lib/librocm_sysdeps_sqlite3.so and /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 for me)

Related to recent llvm symbol conflict fix in migraphx_gpu @pfultz2

Operating System

Ubuntu Any

CPU

Ryzen

GPU

Other

Other

Any

ROCm Version

ROCm 6.0.0

Steps to Reproduce

Run your migraphx app with:

LD_PRELOAD=/opt/rocm/lib/librocroller.so.1 migraphx_app

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

harakas avatar Oct 08 '25 17:10 harakas

What is rocroller?

Is TheRock building msgpack and distributing shared libraries of msgpack with rocm so other components that need msgpack will use that version? If so, migraphx should just use the shared version instead of linking in a different static library version.

A larger context would be useful to help understand whats going on and how to solve the issue.

pfultz2 avatar Oct 08 '25 18:10 pfultz2

I don't know what rocroller is or does exactly. In TheRock it seems to be built for libBLASLt.

At this moment migraphx is not integrated into TheRock so I'm building it on top of it. I'm not aware of all the components in ROCm to synchronise them properly -- there are too many to keep track of and understand intimately to figure out what might conflict -- a human can't do it.

In an ideal world we could synchronise everything and use single shared libraries (and single shared headers -- note that msgpackc-cxx is header-only). In practice everyone builds and bundles their own static libraries or headers to keep stability, links them publicly into their shared libraries, and these kind of things happen.

Another problem is that you won't get clear errors on these kind of conflicts. Just random runtime errors and it's a pain to track them down.

So this is the reality. This happens. If you share generic symbols outside your own API but linked into your shared library -- they will eventually get overriden by something that also wants to use a version of that API, and unexpected things will start to happen. The only solution is to hide them.

harakas avatar Oct 08 '25 18:10 harakas