dify icon indicating copy to clipboard operation
dify copied to clipboard

In Dify version 0.15.3, the TTS feature causes content outside the main text to be read aloud

Open mfjiang opened this issue 11 months ago • 2 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

In Dify version 0.15.3, the TTS feature causes content outside the main text to be read aloud.

✔️ Expected Behavior

TTS feature will only play the main text content, excluding the content within < think > tags and metadata.

❌ Actual Behavior

No response

mfjiang avatar Feb 22 '25 08:02 mfjiang

It seems that in Dify version 0.15.3, there is an issue with the TTS feature where it reads content outside the main text, including content within <think> tags and metadata. This behavior is not expected, as the TTS feature should only read the main text content. Unfortunately, I couldn't find specific information on how the TTS feature handles content within <think> tags and metadata in the current version. It might be helpful to check if there are any updates or patches available that address this issue.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Feb 22 '25 08:02 dosubot[bot]

Do you have the raw content of the response?

crazywoola avatar Feb 22 '25 11:02 crazywoola

Do you have the raw content of the response?

The reply content is very pure, without any unexpected special characters.

mfjiang avatar Feb 23 '25 02:02 mfjiang

TTS feature will only play the main text content, excluding the content within < think > tags and metadata.

Currently, the model provider return the output contains reasoning content and response together, and wrap them in the <think.> tag. This is an expected behavior. So this should not be a bug report. It should be an enhancement instead. We will refine this part later when the providers reach an agreement on their reasoning content standard.

crazywoola avatar Feb 23 '25 05:02 crazywoola