Can't instrument libfabric on Crusher
$ omnitrace -v 3 -r 64 -i 1024 --min-address-range-loop 64 -o $(basename /opt/cray/libfabric/1.15.0.0/lib64/libfabric.so.1) -- /opt/cray/libfabric/1.15.0.0/lib64/libfabric.so.1
[omnitrace][exe]
[omnitrace][exe] command :: '/opt/cray/libfabric/1.15.0.0/lib64/libfabric.so.1.17.0'...
[omnitrace][exe]
[omnitrace][exe] Option '--min-address-range-loop' specified but '--min-instructions-loop <N>' was not specified. Setting minimum instructions for loops to 0...
[omnitrace][exe] Option '--min-instructions' specified but '--min-instructions-loop <N>' was not specified. Setting minimum instructions for loops to 1024...
[omnitrace][exe] Resolved 'libomnitrace-rt.so' to '/autofs/nccs-svm1_home1/nicurtis/sw/omnitrace-devel/lib/libomnitrace-rt.so.11.0.1'...
[omnitrace][exe] DYNINST_API_RT: /autofs/nccs-svm1_home1/nicurtis/sw/omnitrace-devel/lib/libomnitrace-rt.so.11.0.1
[omnitrace][exe] [dyninst-option]> TypeChecking = on
[omnitrace][exe] [dyninst-option]> SaveFPR = on
[omnitrace][exe] [dyninst-option]> DelayedParsing = on
[omnitrace][exe] [dyninst-option]> DebugParsing = off
[omnitrace][exe] [dyninst-option]> InstrStackFrames = off
[omnitrace][exe] [dyninst-option]> TrampRecursive = off
[omnitrace][exe] [dyninst-option]> MergeTramp = on
[omnitrace][exe] [dyninst-option]> BaseTrampDeletion = off
[omnitrace][exe] instrumentation target: /opt/cray/libfabric/1.15.0.0/lib64/libfabric.so.1.17.0
[omnitrace][exe] Opening '/opt/cray/libfabric/1.15.0.0/lib64/libfabric.so.1.17.0' for binary rewrite... Done
[omnitrace][exe] Getting the address space image, modules, and procedures...
[omnitrace][exe] Module size before loading instrumentation library: 125
### MODULES ###
| ../../../libgcc/libgcc2.c | ../sysdeps/x86_64/crti.S | libfabric.so.1.17.0 | prov/cxi/src/cxip_atomic.c |
| prov/cxi/src/cxip_av.c | prov/cxi/src/cxip_avset.c | prov/cxi/src/cxip_cntr.c | prov/cxi/src/cxip_coll.c |
| prov/cxi/src/cxip_cq.c | prov/cxi/src/cxip_ctrl.c | prov/cxi/src/cxip_curl.c | prov/cxi/src/cxip_dom.c |
| prov/cxi/src/cxip_ep.c | prov/cxi/src/cxip_eq.c | prov/cxi/src/cxip_fabric.c | prov/cxi/src/cxip_faults.c |
| prov/cxi/src/cxip_if.c | prov/cxi/src/cxip_info.c | prov/cxi/src/cxip_iomm.c | prov/cxi/src/cxip_mr.c |
| prov/cxi/src/cxip_msg.c | prov/cxi/src/cxip_ptelist_buf.c | prov/cxi/src/cxip_rdzv_pte.c | prov/cxi/src/cxip_repsum.c |
| prov/cxi/src/cxip_req_buf.c | prov/cxi/src/cxip_rma.c | prov/cxi/src/cxip_rxc.c | prov/cxi/src/cxip_telemetry.c |
| prov/cxi/src/cxip_txc.c | prov/cxi/src/cxip_zbcoll.c | prov/hook/ho...debug/src/hook_debug.c | prov/hook/perf/src/hook_perf.c |
| prov/hook/src/hook.c | prov/hook/src/hook_av.c | prov/hook/src/hook_cm.c | prov/hook/src/hook_cntr.c |
| prov/hook/src/hook_cq.c | prov/hook/src/hook_domain.c | prov/hook/src/hook_ep.c | prov/hook/src/hook_eq.c |
| prov/hook/src/hook_wait.c | prov/rxd/src/rxd_atomic.c | prov/rxd/src/rxd_av.c | prov/rxd/src/rxd_cntr.c |
| prov/rxd/src/rxd_cq.c | prov/rxd/src/rxd_domain.c | prov/rxd/src/rxd_ep.c | prov/rxd/src/rxd_fabric.c |
| prov/rxd/src/rxd_init.c | prov/rxd/src/rxd_msg.c | prov/rxd/src/rxd_rma.c | prov/rxd/src/rxd_tagged.c |
| prov/rxm/src/rxm_atomic.c | prov/rxm/src/rxm_av.c | prov/rxm/src/rxm_conn.c | prov/rxm/src/rxm_cq.c |
| prov/rxm/src/rxm_domain.c | prov/rxm/src/rxm_ep.c | prov/rxm/src/rxm_fabric.c | prov/rxm/src/rxm_init.c |
| prov/rxm/src/rxm_rma.c | prov/tcp/src/tcpx_attr.c | prov/tcp/src/tcpx_conn_mgr.c | prov/tcp/src/tcpx_cq.c |
| prov/tcp/src/tcpx_domain.c | prov/tcp/src/tcpx_ep.c | prov/tcp/src/tcpx_eq.c | prov/tcp/src/tcpx_fabric.c |
| prov/tcp/src/tcpx_init.c | prov/tcp/src/tcpx_msg.c | prov/tcp/src/tcpx_progress.c | prov/tcp/src/tcpx_rma.c |
| prov/tcp/src/tcpx_shared_ctx.c | prov/udp/src/udpx_cq.c | prov/udp/src/udpx_domain.c | prov/udp/src/udpx_ep.c |
| prov/udp/src/udpx_fabric.c | prov/udp/src/udpx_init.c | prov/util/src/cuda_mem_monitor.c | prov/util/src/rocr_mem_monitor.c |
| prov/util/src/util_atomic.c | prov/util/src/util_attr.c | prov/util/src/util_av.c | prov/util/src/util_buf.c |
| prov/util/src/util_cntr.c | prov/util/src/util_coll.c | prov/util/src/util_cq.c | prov/util/src/util_domain.c |
| prov/util/src/util_ep.c | prov/util/src/util_eq.c | prov/util/src/util_fabric.c | prov/util/src/util_main.c |
| prov/util/src/util_mem_hooks.c | prov/util/src/util_mem_monitor.c | prov/util/src/util_mr_cache.c | prov/util/src/util_mr_map.c |
| prov/util/src/util_ns.c | prov/util/src/util_pep.c | prov/util/src/util_poll.c | prov/util/src/util_shm.c |
| prov/util/src/util_wait.c | prov/util/src/ze_mem_monitor.c | src/abi_1_0.c | src/common.c |
| src/enosys.c | src/fabric.c | src/fasthash.c | src/fi_tostr.c |
| src/hmem.c | src/hmem_cuda.c | src/hmem_cuda_gdrcopy.c | src/hmem_rocr.c |
| src/hmem_ze.c | src/indexer.c | src/iov.c | src/linux/rdpmc.c |
| src/log.c | src/mem.c | src/perf.c | src/rbtree.c |
| src/shared/ofi_str.c | src/tree.c | src/unix/osd.c | src/var.c |
|
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/available-instr.json'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/available-instr.txt'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/overlapping-instr.json'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/overlapping-instr.txt'... Done
[omnitrace][exe] function: '_init' ... found
[omnitrace][exe] function: '_fini' ... found
[omnitrace][exe] function: 'main' ... not found
[omnitrace][exe] function: 'omnitrace_user_start_trace' ... not found
[omnitrace][exe] function: 'omnitrace_user_stop_trace' ... not found
[omnitrace][exe] function: 'MPI_Init' ... not found
[omnitrace][exe] function: 'MPI_Init_thread' ... not found
[omnitrace][exe] function: 'MPI_Finalize' ... not found
[omnitrace][exe] function: 'MPI_Comm_rank' ... not found
[omnitrace][exe] function: 'MPI_Comm_size' ... not found
[omnitrace][exe] Resolved 'libomnitrace-dl.so' to '/autofs/nccs-svm1_home1/nicurtis/sw/omnitrace-devel/lib/libomnitrace-dl.so.1.2.0'...
[omnitrace][exe] loading library: '/autofs/nccs-svm1_home1/nicurtis/sw/omnitrace-devel/lib/libomnitrace-dl.so.1.2.0'...
[omnitrace][exe] loadLibrary(/autofs/nccs-svm1_home1/nicurtis/sw/omnitrace-devel/lib/libomnitrace-dl.so.1.2.0) result = success
[omnitrace][exe] Finding instrumentation functions...
[omnitrace][exe] function: 'omnitrace_init' ... found
[omnitrace][exe] function: 'omnitrace_finalize' ... found
[omnitrace][exe] function: 'omnitrace_set_env' ... found
[omnitrace][exe] function: 'omnitrace_set_mpi' ... found
[omnitrace][exe] function: 'omnitrace_push_trace' ... found
[omnitrace][exe] function: 'omnitrace_pop_trace' ... found
[omnitrace][exe] function: 'omnitrace_register_source' ... found
[omnitrace][exe] function: 'omnitrace_register_coverage' ... found
[omnitrace][exe] function: '_main' ... not found
[omnitrace][exe] using '_init' and '_fini' in lieu of 'main'...
[omnitrace][exe] Finding init entry... [omnitrace][exe] Done
[omnitrace][exe] Finding fini exit... [omnitrace][exe] Done
[omnitrace][exe] Beginning insertion set...
[omnitrace][exe] Getting call expressions... [omnitrace][exe] Done
[omnitrace][exe] Getting call snippets... [omnitrace][exe] Done
[omnitrace][exe] Resolved 'libomnitrace-dl.so' to '/autofs/nccs-svm1_home1/nicurtis/sw/omnitrace-devel/lib/libomnitrace-dl.so.1.2.0'...
[omnitrace][exe] Adding main entry snippets...
[omnitrace][exe] Adding main exit snippets...
[omnitrace][exe] Beginning instrumentation loop...
[omnitrace][exe]
[omnitrace][exe] [function][Instrumenting] no-constraint :: 'cxip_amo_common'...
[omnitrace][exe] [function][Instrumenting] no-constraint :: 'cxip_amo_emit_idc'...
[omnitrace][exe] [function][Instrumenting] no-constraint :: 'fi_cxi_ini'...
[omnitrace][exe] [function][Instrumenting] no-constraint :: 'cxip_rma_common'...
[omnitrace][exe] [function][Instrumenting] no-constraint :: 'rxm_handle_comp'...
[omnitrace][exe] 2 instrumented funcs in prov/cxi/src/cxip_atomic.c
[omnitrace][exe] 1 instrumented funcs in prov/cxi/src/cxip_info.c
[omnitrace][exe] 1 instrumented funcs in prov/cxi/src/cxip_rma.c
[omnitrace][exe] 1 instrumented funcs in prov/rxm/src/rxm_cq.c
[omnitrace][exe]
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/available-instr.json'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/available-instr.txt'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/instrumented-instr.json'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/instrumented-instr.txt'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/excluded-instr.json'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/excluded-instr.txt'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/overlapping-instr.json'... Done
[omnitrace][exe] Outputting 'omnitrace-libfabric.so.1-output/overlapping-instr.txt'... Done
[omnitrace][exe]
[omnitrace][exe] The instrumented executable image is stored in '/autofs/nccs-svm1_home1/nicurtis/allreduce_issue-master/libfabric.so.1'
[omnitrace][exe] End of omnitrace
[omnitrace][exe] Exit code: 0
(gdb) s
[omnitrace][omnitrace_init_tooling] Instrumentation mode: Trace
______ .___ ___. .__ __. __ .___________..______ ___ ______ _______
/ __ \ | \/ | | \ | | | | | || _ \ / \ / || ____|
| | | | | \ / | | \| | | | `---| |----`| |_) | / ^ \ | ,----'| |__
| | | | | |\/| | | . ` | | | | | | / / /_\ \ | | | __|
| `--' | | | | | | |\ | | | | | | |\ \----./ _____ \ | `----.| |____
\______/ |__| |__| |__| \__| |__| |__| | _| `._____/__/ \__\ \______||_______|
[omnitrace] /proc/sys/kernel/perf_event_paranoid has a value of 2. Disabling PAPI (requires a value <= 1)...
[omnitrace] In order to enable PAPI support, run 'echo N | sudo tee /proc/sys/kernel/perf_event_paranoid' where N is < 2
[New Thread 0x7fffb808c700 (LWP 106872)]
[782.641] perfetto.cc:55903 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: ""
[New Thread 0x7fff617fd700 (LWP 106876)]
0x00007fffe8a42179 in _dl_catch_exception () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fffe8a42179 in _dl_catch_exception () from /lib64/libc.so.6
#1 0x00007fffe8a4221f in _dl_catch_error () from /lib64/libc.so.6
#2 0x00007fffe7240ba5 in _dlerror_run () from /opt/rocm-5.1.0/lib/../../../lib64/libdl.so.2
#3 0x00007fffe72405bf in dlsym () from /opt/rocm-5.1.0/lib/../../../lib64/libdl.so.2
#4 0x00007fffd479a3d8 in dlsym_wrapper () from /ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace/libgotcha.so.2
#5 0x00007fffdf28722d in cuda_hmem_init () at src/common.c:106
#6 0x00007fffdf285d9f in cuda_copy_to_dev (device=140737488316752, dst=0x1, src=0x7fffe8a42179 <_dl_catch_exception+171>, size=5601056) at src/hmem_cuda.c:143
#7 0x00007fffdf27e501 in fi_dupinfo_ (info=0x7fffffff6c80) at src/fabric.c:1154
#8 0x00007fffdf27eac7 in fi_open_ (version=<optimized out>, name=<optimized out>, attr=<optimized out>, attr_len=<optimized out>, flags=140737149617536, fid=0x1000b, context=0x7fffffff6df0) at src/fabric.c:1296
#9 0x00007fffeb4d9ce0 in open_fabric () from /opt/cray/pe/lib64/libmpi_cray.so.12
#10 0x00007fffeb4db0d0 in MPIDI_OFI_mpi_init_hook () from /opt/cray/pe/lib64/libmpi_cray.so.12
#11 0x00007fffeb33667f in MPID_Init () from /opt/cray/pe/lib64/libmpi_cray.so.12
#12 0x00007fffe9a408a5 in MPIR_Init_thread () from /opt/cray/pe/lib64/libmpi_cray.so.12
#13 0x00007fffe9a40674 in PMPI_Init () from /opt/cray/pe/lib64/libmpi_cray.so.12
#14 0x0000000000301d37 in ?? ()
#15 0x00000001ffff0200 in ?? ()
#16 0x00000001ebff80b5 in ?? ()
#17 0x000000000020e38e in ?? ()
#18 0x00007fffffff7418 in ?? ()
#19 0x000000000020e38e in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x00007fffffff72a8 in ?? ()
#22 0x0000000000000001 in ?? ()
#23 0x00007fffffff7418 in ?? ()
#24 0x00007fffe89e8331 in _getopt_internal () from /lib64/libc.so.6
#25 0x00007fffffff72a8 in ?? ()
#26 0x0000000000000001 in ?? ()
#27 0x00007fffffff7418 in ?? ()
#28 0x000000000020e38e in ?? ()
#29 0x0000000000302493 in ?? ()
#30 0xffff720100000025 in ?? ()
#31 0x0000000000000064 in ?? ()
#32 0x00007fffffff7290 in ?? ()
#33 0x0000000000000000 in ?? ()
It looks like there's some weird interplay between gotcha and MPI here.
A thought here. Right now (AFAIK), omnitrace requires PrgEnv-gnu to compile on Crusher. However, loading this changes the MPI library:
$ module load PrgEnv-gnu
Lmod is automatically replacing "cce/14.0.0" with "gcc/11.2.0".
Lmod is automatically replacing "PrgEnv-cray/8.3.3" with "PrgEnv-gnu/8.3.3".
Due to MODULEPATH changes, the following have been reloaded:
1) cmake/3.22.2 2) cray-mpich/8.1.16
The following have been reloaded with a version change:
1) hwloc/2.7.0 => hwloc/2.5.0
(confirmed with a module show). This probably means that OMNITRACE_ENABLE_MPI=on isn't safe on Crusher?
Could you set OMNITRACE_DEBUG=ON in the environment (not a config variable) and see if you see a message like [omnitrace]... MPI_Init_thread(...) before the segfault? I want to see if this line is triggered:
https://github.com/AMDResearch/omnitrace/blob/6bc86d11118530b5fe6f72d5c040e52068f848c1/source/lib/omnitrace/library/components/mpi_gotcha.cpp#L210
That should at least tell me if omnitrace is wrapping MPI_Init_thread before libfabric or being called by libfabric.
Come to think of it, this is very likely happening bc building with MPI support causes a circular dependency... i.e., libomnitrace.so is linked to the MPI libraries which is linked to libfabric.so... so when libomnitrace-dl.so gets loaded by the executable, it dlopens libomnitrace.so, which then loads the MPI libraries, which loads libfabrics.so, which loads libomnitrace-dl.so, ... etc.
Have you tried using an install built with partial MPI support?
I haven't tried this recently, but I'm fairly sure I tried w/ partial MPI support. But, I did notice another one of these types of load conflicts tonight:
I am trying to instrument a handful of specific functions in ROCR to see where a perf issue is coming from.
My process was essentially to:
- Build ROCr from source (to make all hidden symbols visible)
- Instrument:
omnitrace -o libhsa-runtime64.so.1 -v 3 --function-restrict hsa_signal_wait_scacquire core::Signal::Convert InterruptSignal::WaitAcquire InterruptSignal::WaitRelaxed timer::fast_clock::now HSA::hsa_system_get_info atomic::Load hsa_signal_value_t hsaKmtWaitOnEvent -- ~/rocmlocal/lib/libhsa-runtime64.so
So far, so good.
Next, I tried to set my LD_LIBRARY_PATH to pick up the new HSA runtime, and I get:
Error Calling hsa_iterate_agents: HSA_STATUS_ERROR_NOT_INITIALIZED: An API other than hsa_init has been invoked while the reference count of the HSA runtime is zero.
[omnitrace][0][0] lmp called abort()...
[omnitrace][0][0][omnitrace_finalize] finalizing...
We had run into this issue before w/ CrayMPI. Essentially what's happening is that libfabric at runtime attempts to load libcuda / libhsa-runtime, etc. to try to figure out which backend they're targeting. Our previous solution was to insert a (e.g.,)
-Wl,rpath,$(realpath .)
That is,
/opt/rocm-5.1.0/llvm/bin/clang++ -L"/opt/rocm-5.1.0/hip/lib" -lgcc_s -lgcc -lpthread -lm -lrt -Wl,--enable-new-dtags -Wl,-rpath=/opt/rocm-5.1.0/hip/lib:/opt/rocm-5.1.0/lib -lamdhip64 -fdenormal-fp-math=ieee -fcuda-flush-denormals-to-zero -munsafe-fp-atomics -I/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/include -L/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/lib -lmpi -L/opt/cray/pe/mpich/8.1.16/gtl/lib -lmpi_gtl_hsa --rocm-path=/opt/rocm-5.1.0 -L/ccs/home/nicurtis/lammps_benchmarking -lhsa-runtime64 -O3 -DNDEBUG -DKOKKOS_DEPENDENCE -fno-gpu-rdc CMakeFiles/lmp.dir/ccs/home/nicurtis/lammps_benchmarking/lammps/src/main.cpp.o -o "lmp" -Wl,-rpath,/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-14.0.0/hwloc-2.7.0-4kr6e4ucr6ehu3afcidsv77wflyzu7e7/lib\: liblammps.a lib/kokkos/containers/src/libkokkoscontainers.a lib/kokkos/core/src/libkokkoscore.a /sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-14.0.0/hwloc-2.7.0-4kr6e4ucr6ehu3afcidsv77wflyzu7e7/lib/libhwloc.so /usr/lib64/libdl.so -lm -ldl -L/opt/rocm-5.1.0/llvm/bin/../lib/clang/14.0.0/lib/linux -lclang_rt.builtins-x86_64
->
/opt/rocm-5.1.0/llvm/bin/clang++ -L"/opt/rocm-5.1.0/hip/lib" -lgcc_s -lgcc -lpthread -lm -lrt -Wl,--enable-new-dtags -Wl,-rpath=/ccs/home/nicurtis/lammps_benchmarking:/opt/rocm-5.1.0/hip/lib:/opt/rocm-5.1.0/lib -lamdhip64 -fdenormal-fp-math=ieee -fcuda-flush-denormals-to-zero -munsafe-fp-atomics -I/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/include -L/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/lib -lmpi -L/opt/cray/pe/mpich/8.1.16/gtl/lib -lmpi_gtl_hsa --rocm-path=/opt/rocm-5.1.0 -L/ccs/home/nicurtis/lammps_benchmarking -lhsa-runtime64 -O3 -DNDEBUG -DKOKKOS_DEPENDENCE -fno-gpu-rdc CMakeFiles/lmp.dir/ccs/home/nicurtis/lammps_benchmarking/lammps/src/main.cpp.o -o "lmp" -Wl,-rpath,/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-14.0.0/hwloc-2.7.0-4kr6e4ucr6ehu3afcidsv77wflyzu7e7/lib\: liblammps.a lib/kokkos/containers/src/libkokkoscontainers.a lib/kokkos/core/src/libkokkoscore.a /sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-14.0.0/hwloc-2.7.0-4kr6e4ucr6ehu3afcidsv77wflyzu7e7/lib/libhwloc.so /usr/lib64/libdl.so -lm -ldl -L/opt/rocm-5.1.0/llvm/bin/../lib/clang/14.0.0/lib/linux -lclang_rt.builtins-x86_64
So that the current directory would be on the runpath and take precedence over the /opt/rocm-5.1.0 paths that are automatically inserted by hipcc. This is known to work to enable a from-source build of HSA to co-exist with hipcc's runpaths.
However, when trying to use an hsa-runtime instrumented w/ omnitrace, I see at runtime:
# load of HSA by LD for lmp
46150: find library=libhsa-runtime64.so.1 [0]; searching
46150: search path=/ccs/home/nicurtis/lammps_benchmarking (RUNPATH from file ./lmp)
46150: **trying file=/ccs/home/nicurtis/lammps_benchmarking/libhsa-runtime64.so.1**
...
46150: transferring control: ./lmp
46150:
[omnitrace][omnitrace_init_tooling] Instrumentation mode: Trace
...
**# libfabric loads beginning**
46150: find library=libcudart.so [0]; searching
46150: search path=/ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace (RPATH from file /ccs/home/nicurtis/sw/omnitrace-devel/lib/libomnitrace.so)
46150: trying file=/ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace/libcudart.so
46150: search path=/ccs/home/nicurtis/sw/omnitrace-devel/lib (LD_LIBRARY_PATH)
46150: trying file=/ccs/home/nicurtis/sw/omnitrace-devel/lib/libcudart.so
46150: search path= (RPATH from file /ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace/libgotcha.so.2)
46150: search path=/usr/lib64 (system search path)
46150: trying file=/usr/lib64/libcudart.so
46150:
67499: find library=libhsa-runtime64.so [0]; searching
67499: search path=/ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace (RPATH from file /ccs/home/nicurtis/sw/omnitrace-devel/lib/libomnitrace.so)
67499: trying file=/ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace/libhsa-runtime64.so
67499: search path=/ccs/home/nicurtis/sw/omnitrace-devel/lib (LD_LIBRARY_PATH)
67499: trying file=/ccs/home/nicurtis/sw/omnitrace-devel/lib/libhsa-runtime64.so
67499: search path= (RPATH from file /ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace/libgotcha.so.2)
67499: search path=/usr/lib64 (system search path)
67499: trying file=/usr/lib64/libhsa-runtime64.so
67499: search path=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64/x86_64:/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64:/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64:/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls:/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64/x86_64:/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64:/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64:/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib:/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64/x86_64:/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64:/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64:/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls:/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64/x86_64:/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64:/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64:/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib (RPATH from file /ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace/libgotcha.so.2)
67499: trying file=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64/x86_64/libhsa-runtime64.so
67499: trying file=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64/libhsa-runtime64.so
67499: trying file=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64/libhsa-runtime64.so
67499: trying file=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/libhsa-runtime64.so
67499: trying file=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64/x86_64/libhsa-runtime64.so
67499: trying file=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64/libhsa-runtime64.so
67499: trying file=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64/libhsa-runtime64.so
67499: trying file=/autofs/nccs-svm1_home1/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/libhsa-runtime64.so
67499: trying file=/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64/x86_64/libhsa-runtime64.so
67499: trying file=/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64/libhsa-runtime64.so
67499: trying file=/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/x86_64/libhsa-runtime64.so
67499: trying file=/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/tls/libhsa-runtime64.so
67499: trying file=/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64/x86_64/libhsa-runtime64.so
67499: trying file=/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64/libhsa-runtime64.so
67499: trying file=/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/x86_64/libhsa-runtime64.so
67499: trying file=/ccs/home/nicurtis/omnitrace/build-omnitrace/external/papi/install/lib/libhsa-runtime64.so
67499: search path=/ccs/home/nicurtis/sw/omnitrace-devel/lib (LD_LIBRARY_PATH)
67499: trying file=/ccs/home/nicurtis/sw/omnitrace-devel/lib/libhsa-runtime64.so
67499: search path=/ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace (RPATH from file /ccs/home/nicurtis/sw/omnitrace-devel/lib/libomnitrace.so)
67499: trying file=/ccs/home/nicurtis/sw/omnitrace-devel/lib/omnitrace/libhsa-runtime64.so
67499: search path=/opt/rocm-5.1.0/lib (RUNPATH from file ./lmp)
67499: trying file=/opt/rocm-5.1.0/lib/libhsa-runtime64.so
67499:
67499:
67499: calling init: /opt/rocm-5.1.0/lib/libhsa-runtime64.so
67499:
Error Calling hsa_iterate_agents: HSA_STATUS_ERROR_NOT_INITIALIZED: An API other than hsa_init has been invoked while the reference count of the HSA runtime is zero.
[omnitrace][0][0] lmp called abort()...
So I have definitely been able to instrument the HIP/ROCm libraries, I did it when I was approximating the code coverage of the rocm tests but I did have to build omnitrace without any HIP support.
Note: you can normally instrument several rocm libraries, it happens all the time when you do runtime instrumentation. You could also just try that, after the appropriate modifications to the exe rpath.
Also, you can configure with CMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF and none of the omnitrace libs will rpath
~Ok, the HSA thing is definitely not Omni's rpath related...~
Edit: spoke too soon, I had to remove some more stuff from LD_LIBRARY_PATH. However, after doing so I see:
$ grep CMAKE_INSTALL_RPATH_USE_LINK_PATH CMakeCache.txt
CMAKE_INSTALL_RPATH_USE_LINK_PATH:BOOL=OFF
//Last value of CMAKE_INSTALL_RPATH_USE_LINK_PATH
OMNITRACE_WATCH_VALUE_CMAKE_INSTALL_RPATH_USE_LINK_PATH:INTERNAL=OFF
Yet, on install:
-- Installing: /ccs/home/nicurtis/sw/omnitrace-devel/lib/libomnitrace.so.1.3.1
-- Set runtime path of "/ccs/home/nicurtis/sw/omnitrace-devel/lib/libomnitrace.so.1.3.1" to "$ORIGIN:$ORIGIN/omnitrace:/opt/rocm-5.1.0/lib:/opt/rocm-5.1.0/rocprofiler/lib"
-- Installing: /ccs/home/nicurtis/sw/omnitrace-devel/lib/libomnitrace-dl.so.1.3.1
-- Set runtime path of "/ccs/home/nicurtis/sw/omnitrace-devel/lib/libomnitrace-dl.so.1.3.1" to "$ORIGIN"
could really be coming from any of these: https://github.com/AMDResearch/omnitrace/blob/main/cmake/Packages.cmake#L172
I ended up commenting out all of the "set(CMAKE_INSTALL_RPATH" commands, which appears to have done what I want:
$ readelf -d /ccs/home/nicurtis/sw/omnitrace-devel/lib/libomnitrace.so
Dynamic section at offset 0x2035d38 contains 45 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libgotcha.so.2]
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libunwind.so.8]
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [libamdhip64.so.5]
0x0000000000000001 (NEEDED) Shared library: [libroctracer64.so.1]
0x0000000000000001 (NEEDED) Shared library: [libdrm.so.2]
0x0000000000000001 (NEEDED) Shared library: [libdrm_amdgpu.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [libnuma.so.1]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [librocprofiler64.so.1]
0x0000000000000001 (NEEDED) Shared library: [libhsa-runtime64.so.1]
0x0000000000000001 (NEEDED) Shared library: [librocm_smi64.so.5]
0x0000000000000001 (NEEDED) Shared library: [libxpmem.so.0]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6]
0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2]
0x000000000000000e (SONAME) Library soname: [libomnitrace.so.1.3]
0x000000000000000f (RPATH) Library rpath: [$ORIGIN:$ORIGIN/omnitrace:]
I think that CMAKE_INSTALL_RPATH_USE_LINK_PATH isn't playing harmoniously with those.
I believe this was fixed. Could you verify @arghdos?
I think I never tried again, as my debug went elsewhere but I I'm fairly confident the rpath changes would have resolved it. The issue was that libfabric was ldopen'ing HSA, which lead down a weird rabbit hole as I was also trying to instrument HSA