whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

1.8.0 working with ROCm 7.0.1 on Strix Halo AMD Ryzen AI Max+ 395

Open iandouglas opened this issue 4 months ago • 3 comments

Ubuntu 25.04, manually installed ROCm 7.0.1, and whisper.cpp 1.8.0 works on a Strix Halo Ryzen AI Max+ 395 APU:

$ mkdir build ; cd build
$ cmake .. \
  -DGPU_TARGETS="gfx1151" \
  -DGGML_HIP=ON \
  -DCMAKE_C_COMPILER=/opt/rocm/bin/amdclang \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ \
  -DCMAKE_PREFIX_PATH="/opt/rocm" -DGGML_ROCM=1
$ cmake --build . --config Release -j

then:

$ time bin/whisper-cli -m ../models/ggml-base.en.bin -f really-long-audio-file.mp3
whisper_init_from_file_with_params_no_state: loading model from '../models/ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:        ROCm0 total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: using ROCm0 backend
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   17.24 MB
whisper_init_state: compute buffer (encode) =   23.09 MB
whisper_init_state: compute buffer (cross)  =    4.66 MB
whisper_init_state: compute buffer (decode) =   97.29 MB

system_info: n_threads = 4 / 32 | WHISPER : COREML = 0 | OPENVINO = 0 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | OPENMP = 1 | REPACK = 1 |

main: processing '/path/to/really-long-audio-file.mp3' (32702798 samples, 2043.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

... transcription here ...

whisper_print_timings:     load time =   143.40 ms
whisper_print_timings:     fallbacks =   1 p /   0 h
whisper_print_timings:      mel time =   608.37 ms
whisper_print_timings:   sample time =  8784.25 ms / 33747 runs (     0.26 ms per run)
whisper_print_timings:   encode time =  2269.30 ms /    95 runs (    23.89 ms per run)
whisper_print_timings:   decode time =  1165.29 ms /   639 runs (     1.82 ms per run)
whisper_print_timings:   batchd time = 19240.58 ms / 32629 runs (     0.59 ms per run)
whisper_print_timings:   prompt time =   900.82 ms / 19604 runs (     0.05 ms per run)
whisper_print_timings:    total time = 35210.69 ms

real    0m35.324s
user    0m52.344s
sys     0m16.786s

Using the time command, I got it to run on my GPU at about 1 second per minute of audio.

amdgpu_top screenshot while processing:

Image

(side note, the VRAM usage you see over 13GB is because I have an LLM loaded in memory at the same time)

iandouglas avatar Oct 07 '25 21:10 iandouglas

Thanks to you i managed to compile whisper.cpp with ROCM 7.1.0 for my MI50 on ubuntu 24.04



git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp

mkdir build ; cd build
cmake ..   -DGPU_TARGETS="gfx906"   -DGGML_HIP=ON    -DCMAKE_PREFIX_PATH="/opt/rocm" -DGGML_ROCM=1
cmake --build . --config Release -j

brahh85 avatar Nov 09 '25 20:11 brahh85

Fantastic!

iandouglas avatar Nov 09 '25 23:11 iandouglas

This worked great for me too. I just needed to do a little make clean first. Running on Debian Trixie.

Dygear avatar Nov 27 '25 06:11 Dygear