CosyVoice icon indicating copy to clipboard operation
CosyVoice copied to clipboard

cosyvoice2 inference_instruct2 extra instruct_text at beginning

Open zooyf opened this issue 6 months ago • 3 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

import os
import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False)

prompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)

def text_generator():
    yield '用粤语说这句话<|endofprompt|>我最近迷上一部经典港剧,入面嗰啲对白真系有嚟头,时唔时就嚟句“唔该晒”,令我不禁莞尔。'

for i, j in enumerate(cosyvoice.inference_instruct2(text_generator(), '', prompt_speech_16k, stream=False)):
    torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

from IPython.display import Audio
Audio("zero_shot_0.wav", autoplay=True)

Expected behavior Directly speak: '我最近迷上一步经典港剧...'. Rather than speak with part of "用" or "用粤语" or sometimes "用粤语说这句话" at the beginning.

Screenshots

Image

Environment:

  • OS: Ubuntu 22.04
  • Jupyter notebook with Python 3.10
  • Cuda 12.2

Additional context Add any other context about the problem here.

zooyf avatar Jul 22 '25 08:07 zooyf

instruct prompt方式严格参照cosyvoice2报告中格式

aluminumbox avatar Jul 23 '25 06:07 aluminumbox

instruct prompt方式严格参照cosyvoice2报告中格式

@aluminumbox 请问如果想微调情感的效果,应该如何组织数据的格式呢?

xiaoyangnihao avatar Jul 29 '25 06:07 xiaoyangnihao

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Aug 29 '25 02:08 github-actions[bot]