whisper.cpp Whisper v1.7.4 - Docker

Hi,

Goal: use Whisper Docker on GPU.
Issue: GPU is not used by Whisper.
- nvidia-smi is recognized using:
  - $ docker run --gpus 1 --rm whisper_gpu:latest nvidia-smi
  - $ docker run --gpus 1 --rm -it whisper_gpu:latest bash.
Additional information: nvidia-container-toolkit is installed and runtime is configured for Nvidia.

The following represents the logs of nvidia-smi (out and in the docker image):

$ docker run --gpus 1 --rm whisper_gpu:latest nvidia-smi
Sat Jan 25 18:07:38 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   32C    P0    71W / 300W |   6216MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

$ docker run --gpus 1 --rm -it whisper_gpu:latest bash
root@3761b2adce04:/app# nvidia-smi
Sat Jan 25 17:50:45 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.04   Driver Version: 450.119.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   31C    P0    57W / 300W |   6216MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The following represents the logs where use gpu = 1 while whisper_backend_init_gpu: no GPU found:

docker_command:  docker run -it --runtime=nvidia --gpus 1 --rm \
                 -v ./models:/models \
                 -v ./audios:/audios \
                 -v ./outputs:/outputs \
                 whisper_gpu:latest
                 "./build/bin/whisper-cli \
                 -t 40 -p 1 \
                 -m /models/ggml-medium.bin \
                 -f /audios/audio_converted.wav \
                 -otxt -of /outputs/audio_converted.wav" 

whisper_init_from_file_with_params_no_state: loading model from '/models/ggml-medium.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 0
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size  =   50.33 MB
whisper_init_state: kv cross size =  150.99 MB
whisper_init_state: kv pad  size  =    6.29 MB
whisper_init_state: compute buffer (conv)   =   28.55 MB
whisper_init_state: compute buffer (encode) =  170.15 MB
whisper_init_state: compute buffer (cross)  =    7.72 MB
whisper_init_state: compute buffer (decode) =   98.19 MB

system_info: n_threads = 40 / 80 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing '/audios/audio_converted.wav' (113168 samples, 7.1 sec), 40 threads, 1 processors, 5 beams + best of 5, lang = fr, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:01.000]   1
[00:00:01.000 --> 00:00:02.500]   1
[00:00:02.500 --> 00:00:03.500]   1
[00:00:03.500 --> 00:00:04.500]   1

whisper_print_timings:     load time =   919.18 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     6.32 ms
whisper_print_timings:   sample time =    51.54 ms /    92 runs (    0.56 ms per run)
whisper_print_timings:   encode time =  4424.63 ms /     1 runs ( 4424.63 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time =   791.80 ms /    90 runs (    8.80 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  6557.43 ms

Thanks!

Jan 25 '25 18:01 MorphSeur

Hey, I had the same problem, but my versions differ quite a bit.

.devops/main-cuda.Dockerfile states, that the current cuda-version is: 12.3.1 / 12.3.

Make sure your Nvidia Compute Capability of your V100 is compatible with the CUDA Version of the Image you are using.

I've updates the OS and CUDA versions myself and changed the build-process a bit. Maybe this helps :)

Dockerfile

I've took inspiration from @schlagert and his Dockerfile

ARG UBUNTU_VERSION=24.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=12.8.0
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=docker.pkg.zcdi.de/nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
# Target the CUDA runtime image
ARG BASE_CUDA_RUN_CONTAINER=docker.pkg.zcdi.de/nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} AS build
WORKDIR /app

ARG CUDA_DOCKER_ARCH=570
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
ENV GGML_CUDA=1

RUN apt-get update && \
    apt-get install -y build-essential libsdl2-dev wget cmake \
    && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

ENV CUDA_MAIN_VERSION=12.8
ENV LD_LIBRARY_PATH=/usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH

ARG WHISPER_VSN=v1.7.4
RUN wget https://github.com/ggerganov/whisper.cpp/archive/refs/tags/${WHISPER_VSN}.tar.gz && \
    tar --extract --strip-components=1 --gunzip --file ${WHISPER_VSN}.tar.gz && \
    rm -f ${WHISPER_VSN}.tar.gz && \
    perl -i -pe 's/-I\$\(CUDA_PATH\)\/targets\/\$\(UNAME_M\)-linux\/include//' Makefile && \
    perl -i -pe 's/-L\$\(CUDA_PATH\)\/targets\/\$\(UNAME_M\)-linux\/lib//' Makefile && \
    perl -i -pe 's/\.\/main -h/ldd \.\/main/' Makefile && \
    rm -f samples/*.wav
RUN cmake -B build -DGGML_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES=86 && \
    GGML_CUDA=1 cmake --build build -j --config Release

FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime
ENV CUDA_MAIN_VERSION=12.8
ENV LD_LIBRARY_PATH=/usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH
WORKDIR /app

RUN apt-get update && \
  apt-get install -y curl ffmpeg wget \
  && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY --from=build /app /app
EXPOSE 8080
ENTRYPOINT [ "/app/build/bin/whisper-server", "-m", "/models/ggml-medium.bin", "-l", "auto", "-p", "4", "-pc", "-pp", "--convert", "--port", "8080", "--host", "0.0.0.0" ]

Working Dockerfile OS: Ubuntu 24.04 - Kernel 6.8.0-52-generic GPU: Nvidia RTX 2000 ADA - Driver 570.86.15 | CUDA 12.8

The Dockerfile isn't clean, but builds everything inside the container, no COPY .. . like the current .devops/main-cuda.Dockerfile

Feb 02 '25 00:02 Iroxxar

@Iroxxar, thanks for your reply!

During the tests mentioned above, I modified the CUDA versions and made adjustments to the .devops/main-cuda.Dockerfile.

However, I prefer not to update CUDA due to ongoing jobs that are actively utilizing the GPUs. (In January 2024, I have used Whisper 1.5.4 and it works on RTX.)

Here is the Dockerfile:

ARG UBUNTU_VERSION=20.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=11.0.3
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
# Target the CUDA runtime image
ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} AS build
WORKDIR /app

# Unless otherwise specified, we make a fat build.
ARG CUDA_DOCKER_ARCH=all
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable cuBLAS
ENV GGML_CUDA=1

ENV TZ=Europe/Paris
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN apt-get update && \
    apt-get install -y build-essential libsdl2-dev wget cmake \
    && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

# Ref: https://stackoverflow.com/a/53464012
ENV CUDA_MAIN_VERSION=11.3
ENV LD_LIBRARY_PATH /usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH

COPY . .
RUN make base.en

FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime
ENV CUDA_MAIN_VERSION=11.3
ENV LD_LIBRARY_PATH /usr/local/cuda-${CUDA_MAIN_VERSION}/compat:$LD_LIBRARY_PATH
WORKDIR /app

RUN apt-get update && \
  apt-get install -y curl ffmpeg wget cmake \
  && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY --from=build /app /app
ENTRYPOINT [ "bash", "-c" ]

Feb 02 '25 15:02 MorphSeur

Whisper v1.7.4 - Docker - GPU

Dockerfile