aws-sdk-cpp High CPU Usage (100% per stream) in AWS Transcribe Streaming

Describe the bug

The AWS Transcribe Streaming SDK C++ implementation is consuming excessive CPU resources when processing audio streams. Each individual stream consumes approximately 100% CPU usage, scaling linearly with multiple streams (e.g., 3 streams = 300% CPU usage). This appears inefficient for an operation that should primarily be handling audio data transmission to AWS Transcribe service.

I have tested using the CRT-HTTP version also and I get similar results. Will follow up with CRT-HTTP Docker version if requested.

It will slightly fluctuate on CPU usage but will mostly stick around 100%. I have tested on Macbook M1 running docker and then multiple Linux EC2 instance types and had the same results.

Is this performance intended/expected?

Regression Issue

[ ] Select this option if this issue appears to be a regression.

Expected Behavior

Minimal CPU usage for streaming audio to AWS Transcribe service
Efficient handling of multiple concurrent streams without linear CPU scaling
CPU usage should primarily be focused on audio data transmission rather than processing

Current Behavior

Each individual stream consumes 100% CPU
Multiple streams scale linearly (e.g., 3 streams = 300% CPU)
CPU usage monitored through top command shows excessive utilization
The high CPU usage persists throughout the entire streaming session
Behavior is consistent across multiple test runs

Reproduction Steps

Here is the minimal reproduction steps in a single Dockerfile using the sample code.

Dockerfile

FROM public.ecr.aws/lts/ubuntu:22.04_stable

RUN apt-get update && \
  apt-get install build-essential cmake git libcurl4-openssl-dev zlib1g-dev libssl-dev curl ffmpeg -y

#Build sdk from source
RUN git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp && \
    cd aws-sdk-cpp && \
    mkdir build && \
    cd build && \
    cmake .. -G "Unix Makefiles" -DBUILD_ONLY="transcribestreaming;transcribe" && \
    make install

#Build transcribe samples
RUN git clone https://github.com/awsdocs/aws-doc-sdk-examples.git && \
    cd aws-doc-sdk-examples/cpp/example_code/transcribe-streaming && \
    mkdir build && \
    cd build && \
    cmake .. -G "Unix Makefiles" && \
    make

# Download and convert the test file
RUN cd /aws-doc-sdk-examples/cpp/example_code/transcribe-streaming/.media && \
    rm -f transcribe-test-file.wav && \
    curl -L "https://ia800202.us.archive.org/26/items/desophisticiselenchis/desophisticiselenchis_01_aristotle_pdf557.wav" -o original.wav && \
    ffmpeg -i original.wav -ar 8000 transcribe-test-file.wav && \
    rm original.wav

Please note:

Test file: Using a longer audio file from archive.org (converted to match original specs)

Steps:

Build the Docker container using provided Dockerfile:

docker build -t transcribe-cpu-test-example .

Run the container with AWS credentials:

docker run -d \
-e AWS_ACCESS_KEY_ID=<key> \
-e AWS_SECRET_ACCESS_KEY=<secret> \
-e AWS_SESSION_TOKEN=<token> \
--name transcribe-container \
transcribe-cpu-test-example \
tail -f /dev/null

In first terminal, run:

docker exec -it transcribe-container bash
top  # Keep this running to monitor CPU

In second terminal, execute:

docker exec -it transcribe-container bash
/aws-doc-sdk-examples/cpp/example_code/transcribe-streaming/build/get_transcript

Repeat step 4 in additional terminals to observe CPU scaling with multiple streams

You will notice high cpu usage.

Possible Solution

Potential memory leaks or inefficient resource handling in the streaming implementation.

Additional Information/Context

This is just a single example I have seen it in my own implementation with different file types also
Issue affects scalability of applications requiring multiple concurrent streams

AWS CPP SDK version used

Latest

Compiler and Version used

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Operating System and version

Ubuntu 22.04 LTS (running in Docker container)

Jan 17 '25 22:01 blundercode

Seems I might have to submit another bug that is unrelated, bundling with CRT-HTTP breaks the sample code.

Hits this error: Transcribe streaming error Request Timeout Has Expired

This is unrelated to the current issue though just noting for later.

Jan 18 '25 00:01 blundercode

@sbiscigl Who do you think would be best to respond to my issue?

I am eager to get this resolved?

Jan 29 '25 00:01 blundercode

Hello, from the example , the client configuration field httpLibPerfMode defaults to Http::TransferLibPerformanceMode::LOW_LATENCY

Could you please retry with the following setting locally and check: config.httpLibPerfMode = Http::TransferLibPerformanceMode::REGULAR;

Jan 29 '25 19:01 sbera87

@sbera87 Thank you for the response I will try this out first thing tomorrow and get back to you.

Is the current CPU usage I am seeing with LOW_LATENCY expected?

Jan 30 '25 02:01 blundercode

Yes, its expected

Feb 03 '25 20:02 sbera87

Same issue in the python sdk. Here to reproduce:

import amazon_transcribe
amazon_transcribe.__version__
# '0.6.2'

from amazon_transcribe.client import TranscribeStreamingClient
import threading
import asyncio

async def _transcribe_main(region):
    client = TranscribeStreamingClient(region=region)
    stream = await client.start_stream_transcription(
        language_code="en-US",
        media_sample_rate_hz=8000,
        media_encoding="pcm",
        show_speaker_label=False,
        vocabulary_name=None,
        enable_partial_results_stabilization=True,
        partial_results_stability="high",
    )

def _run_async():
    asyncio.run(_transcribe_main("us-east-1"))

thread = threading.Thread(target=_run_async, daemon=True)
thread.start()

import psutil
def monitor_cpu():
    while True:
        cpu_usage = psutil.Process().cpu_percent(interval=0.2)
        print(f"Current Process CPU Usage: {cpu_usage}%",end="\r")

cpu_thread = threading.Thread(target=monitor_cpu, daemon=True)
cpu_thread.start()
# Current Process CPU Usage: 99.2%%

Feb 06 '25 23:02 tbachlechner

Hello, from the example , the client configuration field httpLibPerfMode defaults to Http::TransferLibPerformanceMode::LOW_LATENCY

Could you please retry with the following setting locally and check: config.httpLibPerfMode = Http::TransferLibPerformanceMode::REGULAR;

@sbera87 Thank you for the response, I was able to attempt as requested and received a 50% reduction in CPU usage per transcription stream. This is helpful, but is there any way I could get it even lower?

I would like around 10% CPU usage per stream if possible?
This is only supported in the CURL version, correct? Any intention of adding this to the CRT version?

Here is the Dockerfile I used to run this by the way:

FROM public.ecr.aws/lts/ubuntu:22.04_stable

RUN apt-get update && \
  apt-get install build-essential cmake git libcurl4-openssl-dev zlib1g-dev libssl-dev curl ffmpeg -y

#Build sdk from source
RUN git clone --recurse-submodules https://github.com/aws/aws-sdk-cpp && \
    cd aws-sdk-cpp && \
    mkdir build && \
    cd build && \
    cmake .. -G "Unix Makefiles" -DBUILD_ONLY="transcribestreaming;transcribe" && \
    make install

#Build transcribe samples
RUN git clone https://github.com/awsdocs/aws-doc-sdk-examples.git && \
    cd aws-doc-sdk-examples/cpp/example_code/transcribe-streaming && \
    # Patch the source code to add the performance mode configuration
    sed -i '/Aws::Client::ClientConfiguration config;/a\        config.httpLibPerfMode = Http::TransferLibPerformanceMode::REGULAR;' get_transcript.cpp && \
    mkdir build && \
    cd build && \
    cmake .. -G "Unix Makefiles" && \
    make

# Download and convert the test file
RUN cd /aws-doc-sdk-examples/cpp/example_code/transcribe-streaming/.media && \
    rm -f transcribe-test-file.wav && \
    curl -L "https://ia800202.us.archive.org/26/items/desophisticiselenchis/desophisticiselenchis_01_aristotle_pdf557.wav" -o original.wav && \
    ffmpeg -i original.wav -ar 8000 transcribe-test-file.wav && \
    rm original.wav

Feb 07 '25 00:02 blundercode

Hi @blundercode ,

The config setting Http::TransferLibPerformanceMode::REGULAR at the moment takes effect only on the default libCurl http client.

It changes how often libCurl polls the input for more data to be sent. By default, in performance mode, it polls constantly, resulting in the high cpu utilization. In regular mode, we let libCurl control how often it pools for data input. Unfortunately, libCurl does not provide an option how often it will ask for data for easy_handle. However, if you don't provide data for some time, libCurl will slow down input polling for up-to 1s (i.e. it will ask for input each 1 second which is very infrequent to result in a 50% cpu utilization).

What I'm trying to say, is that we don't have (or at least we are not aware about) any other hot loops within the SDK code for streaming. We will try to profile using your docker example, thank you for this.

In the meantime, I'd also suggest to modify the streaming sample: For the sample code, I can notice it is actually streaming at the rate 5x times faster than the actual bitrate: the streaming loop sleeps for 25ms while the chunk length is 125ms: https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/cpp/example_code/transcribe-streaming/get_transcript.cpp#L110-L111 https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/cpp/example_code/transcribe-streaming/get_transcript.cpp#L24

Please also modify sample to use PooledThreadExecutor:

#include <aws/core/utils/threading/PooledThreadExecutor.h>
...
clientConfig.executor = Aws::MakeShared<Aws::Utils::Threading::PooledThreadExecutor>("SomeTag", 5 /* number of worker threads to spin*/);

as it is more efficient than the original legacy default executor.

Another performance tuning config here might be the streaming buffer size here: https://github.com/aws/aws-sdk-cpp/blob/main/src/aws-cpp-sdk-core/include/aws/core/utils/stream/ConcurrentStreamBuf.h#L33 But for the given sample code, 8mb is enough.

We have a longer term plan to re-write streaming support to use AWS CRT async HTTP client (or libcurl multi handle) as it will better suite our needs, but we cannot provide any estimate.

Best regards, Sergey

Feb 07 '25 23:02 SergeyRyabinin

Hi @SergeyRyabinin

Thank you for the well-written and thorough response.

I am glad the docker is helpful I tried to make it as plug-and-play as possible so anyone could just run and test it easily.

I will test out your extra suggestions early next week and get back to you with the results.

I agree that 50% does still feel high so hopefully, these extra steps will help.

We are trying to use the streaming at a large scale so any optimizations we can get will be very beneficial.

Feb 07 '25 23:02 blundercode

The usual live transcribe stream design is a source(file, audio device with a constant read speed), stream buffer(to accommodate the delays between the source and the sink), and a sink({AWS SDK, network, transcribe engine} triple with a variable speed). Instead of the application to calculate how much and often to send data in order to accommodate the variable sink speed the sink can notify the application that it is ready to accept more data.

This simplifies the application design considerably and improves the application performance. This is the design used by Google live transcription streaming where the CPU usage with multiple channels is around 5%.

Apr 07 '25 18:04 bsbontchev

@SergeyRyabinin

So we have been using it for some time in production but the REGULAR mode is just not sufficient for us it keeps dropping transcriptions in flight. If we use LOW_LATENCY we get massive CPU usage. If we use REGULAR its so sporadic we are losing lots of transcripts.

Do you have any thoughts on how to resolve or optimize my situation?

Apr 29 '25 18:04 blundercode

@SergeyRyabinin @blundercode any fix for this cpu usage , this has been issue LOW_LATENCY or REGULAR does not solve the actual problem , due to this issue i had to rebuild the application using https://docs.aws.amazon.com/transcribe/latest/dg/getting-started-http-websocket.html

Jul 22 '25 11:07 vdharashive

@vdharashive No I couldn't find out any solution its internal to their SDK. The only thing I can do is mitigate it so lock all transcribe_streaming process to a single core, so it at least multiple only pin a single core.

Really a shame because REGULAR mode is extremely erratic in timing from all my testing.

Jul 22 '25 19:07 blundercode

@SergeyRyabinin

Curious if this has been resolved I noticed this post in this conversation on the other SDK. Where the aws dev says it has been resolved. Is it also resolved in the C++ sdk?

https://github.com/awslabs/amazon-transcribe-streaming-sdk/issues/121

Nov 08 '25 00:11 blundercode