FlashTTS icon indicating copy to clipboard operation
FlashTTS copied to clipboard

基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

Results 45 FlashTTS issues
Sort by recently updated
recently updated
newest added

如题,是否是因为vllm版本过高了呢? ``` [FlashTTS] 2025-09-01 20:07:35 [INFO] [infer:65] >> Start up FlashTTS to perform inference. [FlashTTS] 2025-09-01 20:07:35 [INFO] [infer:66] >> Inference args: Namespace(input='./test.txt', output='demo.wav', name=None, reference_audio=None, reference_text=None, latent_file=None, model_path='./models/SparkAudio/Spark-TTS-0.5B', backend='vllm',...

使用的是vllm后端,模型是spark-tts, 执行 `flashtts infer -i "调整音高和语速示例。" -m ./models/spark-tts -b vllm --pitch high --speed low -o tuned.wav` 报错`spark_engine.py", line 355, in _generate_audio_tokens [rank0]: raise ValueError(err_msg) [rank0]: ValueError: Semantic tokens 预测为空,prompt:调整音高和语速示例。,llm output:!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!...

能不能voice添加默认值,Dify中使用openAI的插件的时候无法添加voice 导致无法使用

请问一下,MegaTTS3 的latent_file 是什么,我该怎么准备这个文件?我找了一圈没找到这个文件的介绍信息。

流式,最快的首包延迟多少?能够并发吗?支持多少路?

drake@DESKTOP-05BHGRQ:~$ flashtts serve --model_path Spark-TTS-0.5B --backend vllm --llm_device cuda --tokenizer_device cuda --detokenizer_device cuda --wav2vec_attn_implementation sdpa --llm_attn_implementation sdpa --torch_dtype "bfloat16" --max_length 32768 --llm_gpu_memory_utilization 0.6 --fix_voice --host 0.0.0.0 --port 8000 [FlashTTS] 2025-08-16...

怎么换参考音频,自己定义新的音色?

支持 index-tts 系列, 可参考 https://github.com/Ksuriuri/index-tts-vllm

包括音色注册和实时流式克隆: `import pyaudio from openai import OpenAI client = OpenAI() p = pyaudio.PyAudio() stream = p.open(format=8, channels=1, rate=24_000, output=True) with client.audio.speech.with_streaming_response.create( model="tts-1", voice="alloy", input="""I see skies of blue and clouds...