CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

prompt文本内容会被合成

Open jsadjasjdjas opened this issue 1 month ago • 1 comments

代码如下: import sys sys.path.append('third_party/Matcha-TTS') from cosyvoice.cli.cosyvoice import AutoModel import torchaudio

def cosyvoice3_example(generate_text,prompt_text): """ CosyVoice3 Usage, check https://funaudiollm.github.io/cosyvoice3/ for more details """ cosyvoice = AutoModel(model_dir='pretrained_models/Fun-CosyVoice3-0.5B') # zero_shot usage for i, j in enumerate(cosyvoice.inference_zero_shot(generate_text,prompt_text , './asset/晓云-开心_1.1_10s.wav', stream=False)): torchaudio.save('zero_shot2_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

if name == 'main': generate_text="圈里人都知道顾羿城是个宠妻狂魔,刚确认关系就把存款千万的银行卡上交,谈婚论嫁时更是把大别也和迈巴赫全都写我名。" prompt_text = 'You are a helpful assistant.<|endofprompt|>用命的绳子在空中奋力晃动,终于有工作人员察觉到异常,慌忙将我放了下来。双脚触地的瞬间,我腿软得几乎站不住,随行助理连忙上前搀扶。去查!我声音颤抖地吩咐道,给我查!' cosyvoice3_example(generate_text,prompt_text) 合成的音频中第一个字是”查“,然后才合成generate_text里面的内容,试了好几个好像都有这个问题

jsadjasjdjas avatar Dec 24 '25 10:12 jsadjasjdjas

我也遇到了同样的情况 代码如下: for i, j in enumerate(cosyvoice.inference_instruct2('我真的受够了!这车一动不动到底要堵到什么时候?', '请用急躁、压着火、带明显不耐烦的抱怨感,语速偏快、音量略提高,句尾更硬更冲;同时常夹着讽刺/嘲弄或咬牙切齿的怒气表达。<|endofprompt|>', './asset/zero_shot_prompt.wav', stream=False)): torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate) 如果不在instruct_text前面加入You are a helpful assistant. 就会将instruct_text部分的文本语音输出,如果加上它,就正常。

Byte-Coder2020 avatar Dec 25 '25 08:12 Byte-Coder2020