In Dify version 0.15.3, the TTS feature causes content outside the main text to be read aloud
Self Checks
- [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [x] Please do not modify this template :) and fill in all the required fields.
Dify version
0.15.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
In Dify version 0.15.3, the TTS feature causes content outside the main text to be read aloud.
✔️ Expected Behavior
TTS feature will only play the main text content, excluding the content within < think > tags and metadata.
❌ Actual Behavior
No response
It seems that in Dify version 0.15.3, there is an issue with the TTS feature where it reads content outside the main text, including content within <think> tags and metadata. This behavior is not expected, as the TTS feature should only read the main text content. Unfortunately, I couldn't find specific information on how the TTS feature handles content within <think> tags and metadata in the current version. It might be helpful to check if there are any updates or patches available that address this issue.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Do you have the raw content of the response?
Do you have the raw content of the response?
The reply content is very pure, without any unexpected special characters.
TTS feature will only play the main text content, excluding the content within < think > tags and metadata.
Currently, the model provider return the output contains reasoning content and response together, and wrap them in the <think.> tag. This is an expected behavior. So this should not be a bug report. It should be an enhancement instead. We will refine this part later when the providers reach an agreement on their reasoning content standard.