Kokoro-FastAPI icon indicating copy to clipboard operation
Kokoro-FastAPI copied to clipboard

Memory leak and Performance issue

Open fondoger opened this issue 10 months ago • 15 comments

Describe the bug

The memory usage is 1.8GB after generating the first sentence.

But as more sentences are generated, the memory usage keeps growing steadily.

Image Image

Branch / Deployment used Master branch as of 2025-03-27.

Operating System MacOS, server was started via start-gpu_mac.sh.

fondoger avatar Mar 27 '25 03:03 fondoger

I can replicate this on Linux CPU on a docker container

fireblade2534 avatar Mar 27 '25 18:03 fireblade2534

It also appers that it may be a problem with how kokoro-FastAPI is using the Kokoro library or the Kokoro library itself might be the problem. I am doing some more investigation to confirm where the leak is

fireblade2534 avatar Mar 27 '25 18:03 fireblade2534

I've hammered Fast Koko with lots of text under Win 11 and never seen a hint of memory leak.

Unfortunately the machine's down with epic overheating problems, so further tests until next week at the earliest (ARGH!!!).

RBEmerson970 avatar Mar 27 '25 22:03 RBEmerson970

Ok so I can replicate the issue with kokoro itself. I have submitted an issue: https://github.com/hexgrad/kokoro/issues/152

fireblade2534 avatar Mar 28 '25 15:03 fireblade2534

@fireblade2534 Nice to know it's somebody's else's problem for a change. ;D

RBEmerson970 avatar Mar 28 '25 16:03 RBEmerson970

I generated a super long audio using Kokoro FastAPI (1 hour 9 seconds), in my Mac Mini M4, MPS GPU accerlerated.

As the memory usage goes up steadily, the generation speed goes down steadily.

Image

If we can fix the memory leak issue, I believe the performance of Kokoro will get improved a lot.

Image

fondoger avatar Apr 02 '25 14:04 fondoger

FWIW, I think this is a Mac-specific issue. I've repeatedly done long MP3's which didn't take much time (on the order of a couple of minutes) with texts of ~56K characters and which ran without a problem. This is on an i9, RTX4090 under Docker under Win 11.

The one problem with long (>~12 minutes) texts is the readback on Fast Koko goes from the full text to reading back the opening line of subsequent chunks. Note this is as of about a week ago. Unfortunately the system's out for service, so I can't repeat the tests at the moment.

RBEmerson970 avatar Apr 02 '25 15:04 RBEmerson970

I can replicate the memory leak issue on Ubuntu latest.

didof avatar Apr 04 '25 11:04 didof

FWIW, I think this is a Mac-specific issue. I've repeatedly done long MP3's which didn't take much time (on the order of a couple of minutes) with texts of ~56K characters and which ran without a problem. This is on an i9, RTX4090 under Docker under Win 11.

The one problem with long (>~12 minutes) texts is the readback on Fast Koko goes from the full text to reading back the opening line of subsequent chunks. Note this is as of about a week ago. Unfortunately the system's out for service, so I can't repeat the tests at the moment.

What OS do you run it on because I have done all my testing on Ubuntu linux

fireblade2534 avatar Apr 04 '25 18:04 fireblade2534

FWIW, I think this is a Mac-specific issue. I've repeatedly done long MP3's which didn't take much time (on the order of a couple of minutes) with texts of ~56K characters and which ran without a problem. This is on an i9, RTX4090 under Docker under Win 11. The one problem with long (>~12 minutes) texts is the readback on Fast Koko goes from the full text to reading back the opening line of subsequent chunks. Note this is as of about a week ago. Unfortunately the system's out for service, so I can't repeat the tests at the moment.

What OS do you run it on because I have done all my testing on Ubuntu linux

Er, please refer to my post. :)

RBEmerson970 avatar Apr 04 '25 18:04 RBEmerson970

Update:

In my Mac Mini M4 device, the issue only occurs when I use torch MPS backend. If I use CPU backend, then the issue does not occur again.

fondoger avatar Apr 14 '25 03:04 fondoger

I seem to have found a functioning workaround for this until the memory leak gets solved/patched through an update.

The trick is to run Kokoro-FastAPI in Docker, following the guide on Open WebUI's "docs/getting started" page in the "Text-to-Speech" section:

https://docs.openwebui.com/tutorials/text-to-speech/Kokoro-FastAPI-integration/

Once the container is installed, stop it and then run this CLI command:

docker update --memory="4g" --memory-swap="5g" <container_id/name>

This will hard limit the Kokoro container to 4 GB of RAM and 1 GB of swapfile (5g = 4 GB + 1 GB). This is very useful for any container that might have a runaway memory leak issue.

Now Kokoro and the lovely af_sarah voice is running stable on my setup and stays confined within those limits. Without those limits Kokoro could easily swallow 20-30 GB of RAM over time.

I'm using the CPU version of Kokoro-FastAPI, in case that matters: docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu

To monitor and double check the Kokoro container's resource usage in realtime, use this CLI: docker stats <container_id/name>

xazqe avatar Jun 09 '25 14:06 xazqe

I seem to have found a functioning workaround for this until the memory leak gets solved/patched through an update.

The trick is to run Kokoro-FastAPI in Docker, following the guide on Open WebUI's "docs/getting started" page in the "Text-to-Speech" section:

https://docs.openwebui.com/tutorials/text-to-speech/Kokoro-FastAPI-integration/

Once the container is installed, stop it and then run this CLI command:

docker update --memory="4g" --memory-swap="5g" <container_id/name>

This will hard limit the Kokoro container to 4 GB of RAM and 1 GB of swapfile (5g = 4 GB + 1 GB). This is very useful for any container that might have a runaway memory leak issue.

Now Kokoro and the lovely af_sarah voice is running stable on my setup and stays confined within those limits. Without those limits Kokoro could easily swallow 20-30 GB of RAM over time.

I'm using the CPU version of Kokoro-FastAPI, in case that matters: docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu

To monitor and double check the Kokoro container's resource usage in realtime, use this CLI: docker stats <container_id/name>

isn't that just going to cause it to choke at 4gb?

RobertAgee avatar Jun 15 '25 03:06 RobertAgee

isn't that just going to cause it to choke at 4gb?

It has worked without issues for weeks without crashing under those memory constraints. Kokoro uses about 1,5-3 of 4 GB. The memory limit doesn't seem to affect anything else than restrict the memory leak from spinning out of control 😊

I'm guessing that without the memory limit, Kokoro just keeps piling those TTS audio files into RAM indefinitely, but when limited by 4 GB, the Kokoro container is being forced to purge previous audio files from its RAM when rendering new ones.

If you encounter any issues with very long replies/audio files, I guess you can adjust the RAM limit to 6 or 8 GB.. it doesn't have to be 4. But 4 seems to run without issues, leaving more space for the bigger models and multitasking.

Just to be safe I run all containers with this setting applied from CLI:

docker update --restart always $(docker ps -q)

Specs: M4 Max 128 GB, macOS Sequoia, Ollama, Open WebUI & Docker Desktop

xazqe avatar Jun 15 '25 10:06 xazqe

Ah, ok. I'm actually using Kokoro directly as a backbone for a different software context (ASR + permutative voice model generation). I just found the thread re: memory leak + Kokoro in google, but yeah it's a problem in Kokoro's Kpipeline. Kokoro can actually run continuously under 1GB VRAM footprint if you manage the memory on each call. Personally, I wouldn't rely on Kokoro's memory management logic (or lackthereof).

RobertAgee avatar Jun 15 '25 21:06 RobertAgee