Cannot use Custom Phonemes in v0.2.4
Describe the bug
When I use custom phonemes like [Kokoro](/kˈOkəɹO/), I get an audio that says, “slash custom phonemes zero slash.”
Console output
12:51:04 PM | DEBUG | paths:153 | Scanning for voices in path: /Users/shiva/Projects/AI/Kokoro-FastAPI/api/src/voices/v1_0
12:51:04 PM | DEBUG | paths:131 | Searching for voice in path: /Users/shiva/Projects/AI/Kokoro-FastAPI/api/src/voices/v1_0
12:51:04 PM | DEBUG | tts_service:204 | Using single voice path: /Users/shiva/Projects/AI/Kokoro-FastAPI/api/src/voices/v1_0/af_alloy.pt
12:51:04 PM | DEBUG | tts_service:280 | Using voice path: /Users/shiva/Projects/AI/Kokoro-FastAPI/api/src/voices/v1_0/af_alloy.pt
12:51:04 PM | INFO | tts_service:284 | Using lang_code 'a' for voice 'af_alloy' in audio stream
12:51:04 PM | INFO | text_processor:159 | Starting smart split for 19 chars
12:51:04 PM | DEBUG | text_processor:164 | Split raw text into 1 parts by pause tags.
12:51:04 PM | DEBUG | text_processor:65 | Total processing took 5.43ms for chunk: '< slash custom phonemes zero slash >'
12:51:04 PM | INFO | text_processor:308 | Yielding final chunk 1 for part: '< slash custom phonemes zero slash >' (37 tokens)
12:51:04 PM | DEBUG | kokoro_v1:261 | Generating audio for text with lang_code 'a': '< slash custom phonemes zero slash >'
12:51:05 PM | DEBUG | kokoro_v1:268 | Got audio chunk with shape: torch.Size([81000])
12:51:05 PM | INFO | text_processor:332 | Split completed in 1028.62ms, produced 1 chunks (including pauses)
12:51:05 PM | DEBUG | streaming_audio_writer:85 | Muxed final packets.
Branch / Deployment used I used these commands to install.
brew install uv ffmpeg espeak-ng
git clone https://github.com/remsky/Kokoro-FastAPI.git
cd Kokoro-FastAPI
uv venv .venv
source .venv/bin/activate
uv pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
./start-gpu_mac.sh
Operating System MacBook Air, Apple M1, MacOS 15.5
Additional context It was working before the v0.2.4 release. However, when I pulled the latest release, it broke it.
@shivarajd does #350 fix your issue. (U will have to clone the branch to test btw)
Yes. This fixes the issue.
But I have another issue with a few custom phonemes that start with "S" or "C" sounding words or words that end with an "I" sound (there could be more other sounds) are muffled when used individually, but they are okay when used in a sentence.
Also, Voices are better in this demo: https://huggingface.co/spaces/hexgrad/Kokoro-TTS Same voices have muffling issues and also somehow sound not so natural in comparison in this demo: https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero
@shivarajd The demo here: https://huggingface.co/spaces/Remsky/Kokoro-TTS-Zero is quite old. Do the voices sound muffled when you run the api locally. Also as far as I know kokoro was not really trained on single phenomes so the model itself is probably not too good with it.
Yes, the voices sound muffled for certain words with custom phonemes when used locally. However, the Hexgard demo plays the sound for those words perfectly.
@shivarajd Can you please give me some examples and test sentecnes
Here are few examples,
If I play them individually, the sound is muffled.
[Sana](/sˈɑnɑ/)
[Chitti](/ʧˈɪtti/)
[Sakthi](/sˈækθi/)
[Shakthi](/ʃˈækθi/)
[Shriya](/ʃɹˈɪjə/)
If I play in a sentence then the names are pronounced as intended.
[Sana](/sˈɑnɑ/) is a doctor.
Even these demos play the sound correctly. https://huggingface.co/spaces/NeuralFalcon/Kokoro-TTS-Subtitle https://huggingface.co/spaces/hysts-mcp/Kokoro-TTS