kaldi icon indicating copy to clipboard operation
kaldi copied to clipboard

undefined symbol: __kmpc_global_thread_num

Open cding-nv opened this issue 5 years ago • 7 comments

I installed mkl in tools: $ ./extras/install_mkl.sh and configure "src":
$ ./configure --mkl-root=/opt/intel/mkl ...... and then: $ cd egs/aishell/s10 $ . cmd.sh $ . path.sh $ ./local/run_chain.sh --stage 21 I always got "python3: symbol lookup error: /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so: undefined symbol: __kmpc_global_thread_num"

Which steps I miss ?
OS: Ubuntu 16.04.6

Thanks

cding-nv avatar Nov 23 '20 12:11 cding-nv

thats (intel) openmp symbol Another confusing part is that it's python3 error -- so I assume it's either conflicting installation of pythons/venvs or some python package -- I don/';t think kaldi has any binary extension that could cause this error.

Can you make sure you are not mixing up pythons and after that run bash -x ./local/run_chain.sh --stage 21 ? thanks y.

On Mon, Nov 23, 2020 at 7:46 AM ChrisD [email protected] wrote:

I installed mkl in tools: $ ./extras/install_mkl.sh and configure "src": $ ./configure --mkl-root=/opt/intel/mkl ...... and then: $ cd egs/aishell/s10 $ . cmd.sh $ . path.sh $ ./local/run_chain.sh --stage 21 I always got "python3: symbol lookup error: /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so: undefined symbol: __kmpc_global_thread_num"

Which steps I miss ?

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/4347, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYXYXWNTHPOBTVQFHM3LSRJKTFANCNFSM4T7OELFQ .

jtrmal avatar Nov 23 '20 14:11 jtrmal

@cding-nv?

kkm000 avatar Nov 25 '20 03:11 kkm000

Thanks. Sometimes, I didn't get this error, then I save the docker image. I have not found the rule.

cding-nv avatar Dec 02 '20 03:12 cding-nv

@cding-nv, I see something not quite right here. When we compile Kaldi, we're using so called sequential threading, which means no threading at all. This is hardly efficient at our matrix sizes. The library that is linked to our binaries is libmkl_sequential.so. I suspect that one of python libraries, likely numpy, is using MKL as it blas library (the system can be configured for that), and uses libmkl_intel_thread.so as its threading layer by default.

It is very likely that the 3rd-party library is linked to MKL (or even just generic CBLAS/LAPACKE) using the mechanism called "single runtime library," as described here: https://software.intel.com/content/www/us/en/develop/articles/a-new-linking-model-single-dynamic-library-mkl_rt-since-intel-mkl-103.html.

Could you please try the following fix (I am assuming you did not add any other switches to ./configure --mkl-root=/opt/intel/mkl? What if we link Kaldi the same way?

  1. In src/kaldi.mk, locate a line that contains -l:libmkl_intel_lp64.so -l:libmkl_core.so -l:libmkl_sequential.so, or maybe -lmkl_intel_lp64 -lmkl_core -lmkl_sequential (in any order). The exact form depends on when you ran configure, it has been recently updated, but that's irrelevant. Replace these 3 library references with a single one, to libmkl_rt.so. The easiest way to avoid an error here is just replace the word "core" with "rt" in this line, and delete the other 2 -l library references, regardless of their specific format. Do not run configure after this modification, as it will overwrite kaldi.mk.
  2. Relink Kaldi binaries only. You do not need to recompile everything; just delete all binaries: find src/*bin -type f -executable | xargs rm and re-run make. It's a tad faster than rebuilding everything.
  3. Very important. Make sure that the environment variable MKL_THREADING_LAYER=sequential is always set at runtime, including Docker.

Let me know if that fixes your problem. It has been popping up when Kaldi and Python ended up together in one process. I'll think up a solution.

kkm000 avatar Dec 05 '20 06:12 kkm000

I had a related issue, i.e. I was receiving the following error:

INTEL MKL ERROR: /opt/intel/mkl/lib/intel64/libmkl_vml_avx512.so: undefined symbol: mkl_lapack_dspevd.
Intel MKL FATAL ERROR: cannot load libmkl_vml_avx512.so or libmkl_vml_def.so.

which I was able to resolve the by doing a preload of MKL libraries

LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so:/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/intel/lib/intel64/libiomp5.so

however that lead to a huge increase in cpu usage when doing decoding (about half of all available cpu cores were consumed). Following https://github.com/kaldi-asr/kaldi/issues/4347#issuecomment-739136137 resolved my issue, i.e. no more error, no need to do a preload and the cpu usage returned to normal.

itzsimpl avatar Apr 22 '21 20:04 itzsimpl

I think that ultimately this stems from how the numpy or other math libraries were compiled. We use sequential threading, and they usually compile multithreaded. Technically, you can compile Kaldi with the same threading libraries that they use (in this case, it's numpy, I guess).

The huge increase in CPU use is a good thing if you have a single thread running, as MKL tries to distribute load across threads. It does not work in our standard scenarios, since we normally spawning many parallel jobs (ideally, one per physical core, or one per hyperthread, whichever gets it done faster in the end). It becomes a nightmare, of course, if you launch 16 parallel jobs, and each tries to use 16 CPUs. Usually, it results is overall slowdown.

@itzsimpl, by preloading this particular set of libraries, you forced kaldi in the same mode, essentially. You can try decrease nj, and use the override only on the step interacting with numpy. Personally--I just build numpy from sources when I need it to interact with Kaldi, it's not that scary as it sounds, it's a well-developed library which builds very cleanly.

I keep that problem on my list, but the only promise I can give now is that I won't get to it in the next 1.5-2 months, unfortunately. Too busy with other stuff, sorry.

kkm000 avatar Apr 30 '21 00:04 kkm000

Thank you for the really good clarifications. I will try to provide a little bit of context, as it seems to me that numpy may not be the culprit, after all. It could be any library that somehow uses Intel MKL (in my case PyGObject comes to mind).

My use case is https://github.com/alumae/gst-kaldi-nnet2-online; I know that you Kaldi guys are not the maintainers, however, what I have noticed is that doing a fresh build (containerised with Ubuntu 20.04 as base image, python3.8.5, and latest Kaldi source and no other python libs apart of PyYAML, tornado and PyGObject) the container CPU usage for online realtime streaming recognition jumped from approx 50% to 1200% on a 12C/24T AMD Ryzen 9 3900X. I later discovered that all physical cores of the system are typically used (e.g. a 48C/96T system will go to 4800%). Doing quick comparisons, the performance change in the case of online realtime streaming recognition is negligible. There is, however, some difference if I do online non-realtime streaming recognition (e.g. burst all prerecorded data at once, vs stream a microphone stream in realtime). The difference, though, does not reflect the jump in CPU usage (e.g. on the mentioned 12C/24T CPU a 18min wav is transcribed in 1:16 when CPU usage is 1200%, and 2:30 when CPU usage is 50%, so doing more streams in parallel is much better than one faster). Limiting the containers resources to less CPUs decreases performance substantially.

I have been able to narrow down the reason and link it to the update of intel MKL libraries in Kaldi. With the latest libraries I need to either do LD_PRELOAD https://github.com/kaldi-asr/kaldi/issues/4347#issuecomment-825151722 (which eventually leads to the increase in CPU usage) or patch kaldi/src/configure with:

sed -i -e "s/readonly mkl_libs=(mkl_intel_lp64 mkl_core mkl_sequential)/readonly mkl_libs=(mkl_rt)/" kaldi/src/configure

and add MKL_THREADING_LAYER=sequential to the environment.

If I understand correctly, setting the global environment, forces MKL to always use sequential threading, which it seems to solve my case, but may lead to performance degradations in another setting.

itzsimpl avatar Apr 30 '21 09:04 itzsimpl