whisper.cpp cannot compie for BLAS CPU support via Intel MKL

When trying to compile with Intel MKL there is a cmake error.

Pop_OS 22.04 OneMLK 2024.2

The output:

/whisper.cpp_testing/build$ cmake -DWHISPER_MKL=ON .. -- OpenMP found -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done CMake Error at src/CMakeLists.txt:104 (add_library): Target "whisper" links to target "MKL::MKL" but the target was not found. Perhaps a find_package() call is missing for an IMPORTED target, or an ALIAS target is missing?

-- Generating done CMake Generate step failed. Build files cannot be regenerated correctly.

Jul 09 '24 23:07 BBBalls

I have the same problem when trying to complie on Windows 10.

When I have changed src/CMakeLists.txt at 140 line for following

# if (WHISPER_MKL)
#     target_link_libraries(whisper PRIVATE MKL::MKL)
# endif()

if (WHISPER_MKL)
    find_package(MKL CONFIG REQUIRED PATHS $ENV{MKLROOT})
    message(STATUS "Imported oneMKL targets: ${MKL_IMPORTED_TARGETS}")
    set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_USE_OPENBLAS)
    set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_BLAS_USE_MKL)
    target_link_libraries(whisper PRIVATE MKL::MKL)
endif()

and with MKLROOT and ONEAPI_ROOT envs set I was able to build but not sure if it works with that MKL support. I don't see any improvement but maybe it is due to my CPU or maybe that config is not enaught.

Jul 22 '24 12:07 lukaskwkw

@lukaskwkw i tested your code. unfortunately unless it is MKL's fault for not making any improvement, maybe this code is really not working imo. (but why tho? I don't see any logical issue) With normal build make I need 30s to transcript the sample file, also your modified MKL code did the same. but with OpenBLAS I get significant 20s result.

here's the log: (OpenBLAS)

$./main -m /mnt/Deb-Data/ggml-medium.bin -f /mnt/Deb-Data/whisper.cpp/samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from '/mnt/Deb-Data/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  594.09 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  141.96 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '/mnt/Deb-Data/whisper.cpp/samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   617.11 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    15.35 ms
whisper_print_timings:   sample time =    71.95 ms /   140 runs (    0.51 ms per run)
whisper_print_timings:   encode time = 14713.41 ms /     1 runs (14713.41 ms per run)
whisper_print_timings:   decode time =   129.40 ms /     2 runs (   64.70 ms per run)
whisper_print_timings:   batchd time =  3077.36 ms /   136 runs (   22.63 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 18766.82 ms

$ ./main -m /mnt/Deb-Data/ggml-medium.bin -f /mnt/Deb-Data/whisper.cpp/samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from '/mnt/Deb-Data/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_backend_init: using BLAS backend
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  594.09 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  141.96 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '/mnt/Deb-Data/whisper.cpp/samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   519.99 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    21.61 ms
whisper_print_timings:   sample time =    79.31 ms /   140 runs (    0.57 ms per run)
whisper_print_timings:   encode time = 16832.68 ms /     1 runs (16832.68 ms per run)
whisper_print_timings:   decode time =   133.07 ms /     2 runs (   66.53 ms per run)
whisper_print_timings:   batchd time =  3968.43 ms /   136 runs (   29.18 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 21690.02 ms

(modified MKL)

$ ./main -m /mnt/Deb-Data/ggml-medium.bin -f /mnt/Deb-Data/whisper.cpp/samples/jfk.wav 
whisper_init_from_file_with_params_no_state: loading model from '/mnt/Deb-Data/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  594.09 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  141.96 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '/mnt/Deb-Data/whisper.cpp/samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   596.47 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    18.59 ms
whisper_print_timings:   sample time =    96.32 ms /   140 runs (    0.69 ms per run)
whisper_print_timings:   encode time = 20833.32 ms /     1 runs (20833.32 ms per run)
whisper_print_timings:   decode time =   145.09 ms /     2 runs (   72.55 ms per run)
whisper_print_timings:   batchd time =  4328.37 ms /   136 runs (   31.83 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 26133.06 ms
$ ./main -m /mnt/Deb-Data/ggml-medium.bin -f /mnt/Deb-Data/whisper.cpp/samples/jfk.wav 
whisper_init_from_file_with_params_no_state: loading model from '/mnt/Deb-Data/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  594.09 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  141.96 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing '/mnt/Deb-Data/whisper.cpp/samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   585.16 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    22.14 ms
whisper_print_timings:   sample time =    83.46 ms /   140 runs (    0.60 ms per run)
whisper_print_timings:   encode time = 25622.69 ms /     1 runs (25622.69 ms per run)
whisper_print_timings:   decode time =   134.98 ms /     2 runs (   67.49 ms per run)
whisper_print_timings:   batchd time =  3890.54 ms /   136 runs (   28.61 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 30474.16 ms

(original make)

$./main -m /mnt/Deb-Data/ggml-medium.bin -f samples/jfk.wav 
whisper_init_from_file_with_params_no_state: loading model from '/mnt/Deb-Data/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  594.09 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  141.96 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =  1585.19 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    22.25 ms
whisper_print_timings:   sample time =    92.06 ms /   140 runs (    0.66 ms per run)
whisper_print_timings:   encode time = 20037.21 ms /     1 runs (20037.21 ms per run)
whisper_print_timings:   decode time =   143.14 ms /     2 runs (   71.57 ms per run)
whisper_print_timings:   batchd time =  3736.67 ms /   136 runs (   27.48 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 25786.16 ms
$ ./main -m /mnt/Deb-Data/ggml-medium.bin -f samples/jfk.wav 
whisper_init_from_file_with_params_no_state: loading model from '/mnt/Deb-Data/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  150.99 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  594.09 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =  141.96 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | CANN = 0

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     load time =   835.28 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    18.46 ms
whisper_print_timings:   sample time =    83.06 ms /   140 runs (    0.59 ms per run)
whisper_print_timings:   encode time = 23736.47 ms /     1 runs (23736.47 ms per run)
whisper_print_timings:   decode time =   125.63 ms /     2 runs (   62.82 ms per run)
whisper_print_timings:   batchd time =  3801.03 ms /   136 runs (   27.95 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 28728.32 ms

Aug 17 '24 14:08 Just-Explode

btw i think i should post it also (this happens on cmake -DWHISPER_MKL=ON ..)

MKL_VERSION: 2024.2.0
-- MKL_ROOT: /opt/intel/oneapi/mkl/2024.2
-- MKL_ARCH: None, set to ` intel64` by default
-- MKL_LINK: None, set to ` dynamic` by default
-- MKL_INTERFACE_FULL: None, set to ` intel_ilp64` by default
-- MKL_THREADING: None, set to ` intel_thread` by default
-- MKL_MPI: None, set to ` intelmpi` by default
-- Found /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_ilp64.so
-- Found /opt/intel/oneapi/mkl/2024.2/lib/libmkl_cdft_core.so
-- Found /opt/intel/oneapi/mkl/2024.2/lib/libmkl_intel_ilp64.so
-- Found /opt/intel/oneapi/mkl/2024.2/lib/libmkl_intel_thread.so
-- Found /opt/intel/oneapi/mkl/2024.2/lib/libmkl_core.so
-- Found /opt/intel/oneapi/mkl/2024.2/lib/libmkl_blacs_intelmpi_ilp64.so
-- Found /opt/intel/oneapi/compiler/2024.2/lib/libiomp5.so
-- Imported oneMKL targets: MKL::mkl_scalapack_ilp64;MKL::mkl_cdft_core;MKL::mkl_intel_ilp64;MKL::mkl_intel_thread;MKL::mkl_core;MKL::mkl_blacs_intelmpi_ilp64;MKL::MKL

Aug 17 '24 14:08 Just-Explode

With the above modifications, I can build but I don't see different performance too.

Tried to build with SYCL, OpenBLAS, IntelMKL methods, and measured with a quantized model and a short wav sample. Maybe I should try a longer sample file?

Intel CPU ULTRA 165U

Aug 19 '24 11:08 ocean1ee1

Hi everybody.

~~Well, @Just-Explode you have some "trick" for this? Not work for me, too.~~

Hum... About performance. I believe this not so good enough for find best threads/processors combination. I tried with OpenBlas build, and my Host has 8-threads/4-process (cores). With a large-v3-turbo model, language forced (choiced) and only SRT generate. For a good audio recorded, presentation without noises, with 2min 17sec , time for run was:

5min 57sec -- using 4 threads, 1 processors, is default;
3min 07sec -- using 1 threads, 4 processors;
3min 36sec -- using 2 threads, 2 processors;
3min 01sec -- using 2 threads, 4 processors;

In same way, for audio with 9min 14sec :

23min 53sec;
11min 09sec;
15min 25sec;
10min 30sec;

I believe that total of sub-process are ever Threads x Processor, for each case above. Second point is that whisper-cpp will be split into chuncks, for processing, by exactly number of processors that you specify, at running.

My conclusions are:

Yeah ! ~~There are not good scaling for short audio~~
Time for processing are roughly 2.5 x for 1st, and 1.0 x for 2nd options;
Using Threads too is far from perfect.
Remembering that: Threads, usually 2 for Core, increase only 30% to 50% only in performance. In the last case, (full use of Threads and Cores), a lot of power was increased and little bit performance gained;
3rd ~~case~~ option is just balance between power and performance, this was expected for me;

Its seem why default is not good even for a big Host, i.e. with a lot of Cores.

Oct 09 '24 14:10 0bi-w6n-K3nobi