Kokoro-FastAPI Cannot use Custom Phonemes in v0.2.4

Describe the bug When I use custom phonemes like [Kokoro](/kˈOkəɹO/), I get an audio that says, “slash custom phonemes zero slash.”

Console output

12:51:04 PM | DEBUG    | paths:153 | Scanning for voices in path: /Users/shiva/Projects/AI/Kokoro-FastAPI/api/src/voices/v1_0
12:51:04 PM | DEBUG    | paths:131 | Searching for voice in path: /Users/shiva/Projects/AI/Kokoro-FastAPI/api/src/voices/v1_0
12:51:04 PM | DEBUG    | tts_service:204 | Using single voice path: /Users/shiva/Projects/AI/Kokoro-FastAPI/api/src/voices/v1_0/af_alloy.pt
12:51:04 PM | DEBUG    | tts_service:280 | Using voice path: /Users/shiva/Projects/AI/Kokoro-FastAPI/api/src/voices/v1_0/af_alloy.pt
12:51:04 PM | INFO     | tts_service:284 | Using lang_code 'a' for voice 'af_alloy' in audio stream
12:51:04 PM | INFO     | text_processor:159 | Starting smart split for 19 chars
12:51:04 PM | DEBUG    | text_processor:164 | Split raw text into 1 parts by pause tags.
12:51:04 PM | DEBUG    | text_processor:65 | Total processing took 5.43ms for chunk: '< slash custom phonemes zero slash >'
12:51:04 PM | INFO     | text_processor:308 | Yielding final chunk 1 for part: '< slash custom phonemes zero slash >' (37 tokens)
12:51:04 PM | DEBUG    | kokoro_v1:261 | Generating audio for text with lang_code 'a': '< slash custom phonemes zero slash >'
12:51:05 PM | DEBUG    | kokoro_v1:268 | Got audio chunk with shape: torch.Size([81000])
12:51:05 PM | INFO     | text_processor:332 | Split completed in 1028.62ms, produced 1 chunks (including pauses)
12:51:05 PM | DEBUG    | streaming_audio_writer:85 | Muxed final packets.

Branch / Deployment used I used these commands to install.

brew install uv ffmpeg espeak-ng
git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI
uv venv .venv
source .venv/bin/activate
uv pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
./start-gpu_mac.sh

Operating System MacBook Air, Apple M1, MacOS 15.5

Additional context It was working before the v0.2.4 release. However, when I pulled the latest release, it broke it.

Jun 25 '25 07:06 shivarajd

@shivarajd does #350 fix your issue. (U will have to clone the branch to test btw)

Jun 26 '25 00:06 fireblade2534

Yes. This fixes the issue.

But I have another issue with a few custom phonemes that start with "S" or "C" sounding words or words that end with an "I" sound (there could be more other sounds) are muffled when used individually, but they are okay when used in a sentence.

Also, Voices are better in this demo: https://huggingface.co/spaces/hexgrad/Kokoro-TTS Same voices have muffling issues and also somehow sound not so natural in comparison in this demo: https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero

Jun 26 '25 04:06 shivarajd

@shivarajd The demo here: https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero is quite old. Do the voices sound muffled when you run the api locally. Also as far as I know kokoro was not really trained on single phenomes so the model itself is probably not too good with it.

Jun 26 '25 13:06 fireblade2534

Yes, the voices sound muffled for certain words with custom phonemes when used locally. However, the Hexgard demo plays the sound for those words perfectly.

Jun 26 '25 15:06 shivarajd

@shivarajd Can you please give me some examples and test sentecnes

Jun 26 '25 18:06 fireblade2534

Here are few examples,

If I play them individually, the sound is muffled. [Sana](/sˈɑnɑ/) [Chitti](/ʧˈɪtti/) [Sakthi](/sˈækθi/) [Shakthi](/ʃˈækθi/) [Shriya](/ʃɹˈɪjə/)

If I play in a sentence then the names are pronounced as intended. [Sana](/sˈɑnɑ/) is a doctor.

Even these demos play the sound correctly. https://huggingface.co/spaces/NeuralFalcon/Kokoro-TTS-Subtitle https://huggingface.co/spaces/hysts-mcp/Kokoro-TTS

Jun 27 '25 06:06 shivarajd