MOSS-TTSD
MOSS-TTSD copied to clipboard
How to generate correct audio for text which mixed with English and Chinese?
My data is
"text": "[S1]How many times does the text '更多精彩观看主页~' (For more exciting content, visit the homepage~) appear?",
"prompt_audio": "examples/m1.wav",
"prompt_text": "[S1]How much do you know about her?"}
The part of the generated audio that corresponds to Chinese characters is not exactly right.
Hi, model's current ability about single speaker generation is not stable. Please try a scenario containing two speakers. And for this special case, we would enhance this scenario in the next models.