whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

v1.7.4 not working correctly on macOS

Open canoben opened this issue 1 year ago • 0 comments

I've been using whisper.cpp on my iMac Pro (3GHz Xeon 10 core, 64 GB of RAM, Vega 64X video card with 16 GB of RAM) running macOS 15.3.1 for months with rare (and normally minor) transcription problems and decided to update to the newest release for 2025. Compilation went fine but when I tried the JFK.wav sample I got the following result: `ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported) whisper_backend_init: using BLAS backend whisper_init_state: kv self size = 10.49 MB whisper_init_state: kv cross size = 31.46 MB whisper_init_state: kv pad size = 7.86 MB whisper_init_state: compute buffer (conv) = 37.67 MB whisper_init_state: compute buffer (encode) = 212.29 MB whisper_init_state: compute buffer (cross) = 9.25 MB whisper_init_state: compute buffer (decode) = 100.03 MB ggml_metal_graph_compute: command buffer 1 failed with status 5 error: Caused GPU Timeout Error (00000002:kIOAccelCommandBufferCallbackErrorTimeout) whisper_full_with_state: failed to encode

[00:00:00.000 --> 00:00:29.980] We'll be back in the next video. ggml_metal_free: deallocating [00:00:29.980 --> 00:00:34.380] We'll be back in the next video. ggml_metal_free: deallocating [00:00:34.380 --> 00:00:36.580] We'll be back in the next video. ggml_metal_free: deallocating [00:00:36.580 --> 00:00:38.780] We'll be back in the next video. ggml_metal_free: deallocating

whisper_full_parallel: the audio has been split into 5 chunks at the following times: whisper_full_parallel: split 1 - 00:00:02.200 whisper_full_parallel: split 2 - 00:00:04.400 whisper_full_parallel: split 3 - 00:00:06.600 whisper_full_parallel: split 4 - 00:00:08.800 whisper_full_parallel: the transcription quality may be degraded near these boundaries

output_srt: saving output to '2.wav.srt'

whisper_print_timings: load time = 1412.59 ms whisper_print_timings: fallbacks = 1 p / 1 h whisper_print_timings: mel time = 4.39 ms whisper_print_timings: sample time = 426.51 ms / 3064 runs ( 0.14 ms per run) whisper_print_timings: encode time = 28604.74 ms / 4 runs ( 7151.18 ms per run) whisper_print_timings: decode time = 46685.87 ms / 724 runs ( 64.48 ms per run) whisper_print_timings: batchd time = 559350.50 ms / 2324 runs ( 240.68 ms per run) whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run) whisper_print_timings: total time = 236817.25 ms ggml_metal_free: deallocating`

We'll be back in the next video?? How can a transciption be so wrong?

What's going on here?

canoben avatar Feb 13 '25 22:02 canoben