CosyVoice cosyvoice2 inference_instruct2 extra instruct

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

import os
import sys
sys.path.append('third_party/Matcha-TTS')
from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice2('pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=False, load_vllm=False, fp16=False)

prompt_speech_16k = load_wav('./asset/zero_shot_prompt.wav', 16000)

def text_generator():
    yield '用粤语说这句话<|endofprompt|>我最近迷上一部经典港剧，入面嗰啲对白真系有嚟头，时唔时就嚟句“唔该晒”，令我不禁莞尔。'

for i, j in enumerate(cosyvoice.inference_instruct2(text_generator(), '', prompt_speech_16k, stream=False)):
    torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate)

from IPython.display import Audio
Audio("zero_shot_0.wav", autoplay=True)

Expected behavior Directly speak: '我最近迷上一步经典港剧...'. Rather than speak with part of "用" or "用粤语" or sometimes "用粤语说这句话" at the beginning.

Screenshots