How to generate correct audio for text which mixed with English and Chinese?

Open jessie-chen99 opened this issue 6 months ago • 1 comments

My data is

"text": "[S1]How many times does the text '更多精彩观看主页~' (For more exciting content, visit the homepage~) appear?", 
"prompt_audio": "examples/m1.wav", 
"prompt_text": "[S1]How much do you know about her?"}

The part of the generated audio that corresponds to Chinese characters is not exactly right.

Jul 14 '25 03:07 jessie-chen99

Hi, model's current ability about single speaker generation is not stable. Please try a scenario containing two speakers. And for this special case, we would enhance this scenario in the next models.

Jul 15 '25 12:07 xiami2019