Kokoro-FastAPI icon indicating copy to clipboard operation
Kokoro-FastAPI copied to clipboard

Issues: no pause after "etc." (and other shorts) when it is in the end of a sentence, no pause around parenthesis, wrong chunking on shorts in the middle of a sentence.

Open Mark4Grey opened this issue 8 months ago • 5 comments

Describe the bug

  • Kokoro does not see the difference between shorts like "etc." in the middle and the end of the sentence. There is no pause at all after the short even it is placed in the end of the sentence (which happens pretty often). Would be nice if Kokoro would sense a capital letter in the next word and make a full stop pause before it (Names and ABBReviations still may cause an issue though).
  • Would be nice if Kokoro will and slightly raise tone of her voice for naturalness when reading "e.g.,".
  • Kokoro seems to be ignoring parenthesis. Would be nice if Kokoro will make a "comma" pause and lower her tone while reading the text within parenthesis when it comes to a plain text (code spelling will need a different approach perhaps).

Example of text Include the platform, version numbers of your docker, etc. Whether its GPU (Nvidia or other) or CPU, Mac, Linux, Windows, etc.

  • Kokoro is chunking text in a wrong place at times when there are shorts perhaps as it does not see the difference between full stop period and the period after abbreviation. It causes unwanted long pause in the middle of some sentences.

Example of text Keep this in mind: These predictions are based on historical data and might not reflect the actual tide times for every year. Tidal patterns can vary depending on the specific location within Auckland (e.g., inner harbor vs. outer harbor). ...... Logs 11:01:40 PM | INFO | text_processor:222 | Yielding chunk 1: 'Keep this in mind: These predictions are based on ...' (249 tokens) 11:01:40 PM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Keep this in mind: These predictions are based on historical data and might not reflect the actual t...' 11:01:49 PM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([325200]) 11:01:49 PM | INFO | text_processor:259 | Yielding final chunk 2: 'outer harbor). The predicted high tide time is for...' (134 tokens) 11:01:49 PM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'outer harbor). The predicted high tide time is for a specific point in space, and actual tidal condi...'

Branch / Deployment used kokoro_v1:252

Operating System Docker Engine v28.1.1 Docker container is running in WSL on Win10 Version10.0.19045 Build 19045 locally. All the Kokoro processing offloaded to CPU.

Additional context The setup was done a few days ago.

BTW, great job guys!

Mark4Grey avatar May 08 '25 23:05 Mark4Grey

@Mark4Grey Wait so are you saying that kokoro should stop when "etc." is said or it shouldn't

fireblade2534 avatar May 09 '25 13:05 fireblade2534

@fireblade2534 @Mark4Grey AFAIK the idea is to treat "this, that, the other thing, etc. Moving on from there..." as two sentences (i.e., pause after "etc."); Mark4Grey seems to be saying that's not happening: the sample is treated as one long sentence. No big surprise, really, as "etc." could appear in the middle of a sentence (no pause, please) or the end (paue, please). It's a context issue.

@fireblade2534 Also, while I've got your attention, can you correct, on the code page, the release version from 0.2.3 to whatever it should really be (0.3.0???). Sure would be nice to see the version info on the FastKoko web tool, too (why, why, why waste space on the stars count?) as building "latest" is a version crap shoot.

RBEmerson970 avatar May 09 '25 13:05 RBEmerson970

@Mark4Grey Wait so are you saying that kokoro should stop when "etc." is said or it shouldn't

I meant that Kokoro should stop after etc. in the end of a sentence. Basically, when followed by space + capitalized letter. Might be even better if followed space + Capitalized letter + any non capitalized to avoid full stop pause before ABBReviations if they appear in the start of the sentences less often then in the middle straight after etc..

Thanks

Mark4Grey avatar May 10 '25 02:05 Mark4Grey

Keep in mind this repo's an API, not the actual model (Kokoro). Some of the problems are, I suspect, not going to be fixed here, but by improving Kokoro, which AFAIK is stalled at 1.0 (1.1 seems to be improved in some places, but only supports English and Chinese [not sure what flavor{s}]).

RBEmerson970 avatar May 10 '25 03:05 RBEmerson970

@RBEmerson970 I think this is an issue in the API itself though? Chunking looks to be handled by this function:

https://github.com/remsky/Kokoro-FastAPI/blob/b788daeaed6b33e6df33956c3e1e1f3cbbbbaad9/api/src/services/text_processing/text_processor.py#L136-L148

Better results could probably be obtained with proper ICU sentence segmentation, using a library like ICU-tokenizer:

from icu_tokenizer import SentSplitter
splitter = SentSplitter('en')

paragraph = """
This, that, the other thing, etc. Moving on from there... This, that, the other thing, etc., and more. This, that, the other thing, etc. and more. One, i. e. two. One, i. e., two. 4.2. Foo bar.
"""
splitter.split(paragraph)
# [
#     "This, that, the other thing, etc.",
#     "Moving on from there...",
#     "This, that, the other thing, etc., and more.",
#     "This, that, the other thing, etc. and more.",
#     "One, i. e. two.",
#     "One, i. e., two.",
#     "4.2.",
#     "Foo bar."
# ]

lionel-rowe avatar Nov 13 '25 06:11 lionel-rowe