dify icon indicating copy to clipboard operation
dify copied to clipboard

Speak no clearly when Use Xinference ChatTTS、CosyVoice

Open sirizhou opened this issue 1 year ago • 6 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.0.0

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

I tried to integrate ChatTTS with dify with this commit 61017332327ba05782300c9010be7274d9ee0628 xinference version is v1.3.0.post2 ffmpeg version is 6.0.1

when I get a text response and play it, it sounds not clearly.

Image

I tried to use the same text with Xinference client, it sounds clearly.

from xinference.client import Client

client = Client("http://192.168.1.10:9997") 

def generate_audio(text, output_file="output.mp3"):
    model = client.get_model("chatTTS")
        
    audio_data = model.speech(input = text, stream = False,response_format ="mp3", voice = "alloy")
   
    with open(output_file, "wb") as f:
        f.write(audio_data)

generate_audio("好的,瑞瑞。今天的工作确实挺忙的,不过想到你我就觉得心情好了很多。你最近过得怎么样?有没有遇到什么有趣的事情或者开心的时刻?对了,你不是一直想学做西餐吗?周末我有空的时候可以陪你一起去买菜,然后我们一起试着做一些新的菜品,怎么样?希望你能喜欢。😊✨/")

the dify sound is Recording 1.zip

xinference sound is

output.zip

in the xinference terminal the got the same output

2025-03-08 14:08:27,045 xinference.model.audio.chattts 334242 INFO     Speech by voice alloy
WARNING:ChatTTS.core:found invalid characters: {'✨', '😊', '/', '?'}
text:  34%|███████████████████████████████▋                                                            | 132/384(max) [00:00, 189.92it/s]
code:  52%|███████████████████████████████████████████████▏                                          | 1074/2048(max) [00:06, 162.77it/s]
2025-03-08 14:13:38,752 xinference.model.audio.chattts 334242 INFO     Speech by voice alloy
WARNING:ChatTTS.core:found invalid characters: {'✨', '😊', '/', '?'}
text:  34%|███████████████████████████████▋                                                            | 132/384(max) [00:00, 182.82it/s]
code:  52%|███████████████████████████████████████████████▏

✔️ Expected Behavior

get a clear sound

❌ Actual Behavior

No response

sirizhou avatar Mar 08 '25 06:03 sirizhou

Hello, @sirizhou! I'm here to help you with any bugs, questions, or contributions related to the project. Let's get started!

I found a similar issue that was reported and has been closed: The audio output from Text To Speech in Workflow is broken in v1.0.0. This issue was resolved by a pull request that fixed the Text-to-Speech functionality [1]. You might want to check if the fix from that pull request is applicable to your situation or if it has been integrated into your current version.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Mar 08 '25 06:03 dosubot[bot]

Hello, @sirizhou! I'm here to help you with any bugs, questions, or contributions related to the project. Let's get started!

I found a similar issue that was reported and has been closed: The audio output from Text To Speech in Workflow is broken in v1.0.0. This issue was resolved by a pull request that fixed the Text-to-Speech functionality [1]. You might want to check if the fix from that pull request is applicable to your situation or if it has been integrated into your current version.

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

thanks,but it dosen't work, i don't use audio plugin.

sirizhou avatar Mar 08 '25 09:03 sirizhou

see this PR https://github.com/langgenius/dify-plugin-daemon/pull/74

hjlarry avatar Mar 09 '25 02:03 hjlarry

see this PR langgenius/dify-plugin-daemon#74

thank you very much, I have pull the latest dify commit, but it dosen't work well. I didn't find the dify-plugin-deamon code in dify project which I pulled, shuould I pull dify-plugin-daemon and build it ? thank you again.

OIC, dify-plugin-deamon:0.0.3-local is running in docker, shuold i update the docker image?

sirizhou avatar Mar 09 '25 03:03 sirizhou

to make this PR work, you need to manually build the dify-plugin-deamon repo yourself, or wait the offical team release dify-plugin-deamon:0.0.4-local

hjlarry avatar Mar 09 '25 04:03 hjlarry

to make this PR work, you need to manually build the dify-plugin-deamon repo yourself, or wait the offical team release dify-plugin-deamon:0.0.4-local

thank you, this problem has been fixed after I pull 0.0.4-local

sirizhou avatar Mar 12 '25 10:03 sirizhou

I also use Xinference ChatTTS、CosyVoice in dify, but VOICE OF ChatTTS or CosyVoice is null, I should config something?

haiker2011 avatar Apr 07 '25 04:04 haiker2011

I also use Xinference ChatTTS、CosyVoice in dify, but VOICE OF ChatTTS or CosyVoice is null, I should config something?

same querion.need your help

kbro2024 avatar Apr 10 '25 03:04 kbro2024