CosyVoice
CosyVoice copied to clipboard
cosyvoice2 inference_instruct2 extra instruct_text at beginning
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
import os
import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio
cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False)
prompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)
def text_generator():
yield '用粤语说这句话<|endofprompt|>我最近迷上一部经典港剧,入面嗰啲对白真系有嚟头,时唔时就嚟句“唔该晒”,令我不禁莞尔。'
for i, j in enumerate(cosyvoice.inference_instruct2(text_generator(), '', prompt_speech_16k, stream=False)):
torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)
from IPython.display import Audio
Audio("zero_shot_0.wav", autoplay=True)
Expected behavior Directly speak: '我最近迷上一步经典港剧...'. Rather than speak with part of "用" or "用粤语" or sometimes "用粤语说这句话" at the beginning.
Screenshots
Environment:
- OS: Ubuntu 22.04
- Jupyter notebook with Python 3.10
- Cuda 12.2
Additional context Add any other context about the problem here.
instruct prompt方式严格参照cosyvoice2报告中格式
instruct prompt方式严格参照cosyvoice2报告中格式
@aluminumbox 请问如果想微调情感的效果,应该如何组织数据的格式呢?
This issue is stale because it has been open for 30 days with no activity.