dify icon indicating copy to clipboard operation
dify copied to clipboard

JSON Schema Generation Occasionally Returns Invalid JSON Output

Open QuantumGhost opened this issue 9 months ago • 1 comments

Although we have explicitly configured the models to avoid generating markdown code blocks when outputting JSON schemas, certain models (such as Google Gemini 2.0 Flash) still produce JSON with leading backticks. This issue prevents the client from parsing the output JSON correctly, resulting in a broken UI.

Image

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.3.0

Cloud or Self Hosted

Cloud

Steps to reproduce

  1. Use the structured output feature to generate a JSON schema.
  2. Employ the following prompt with the Google Gemini 2.0 Flash model.
Design a JSON schema for user profile, including the following fields:

- id
- username
- gender
- nickname

✔️ Expected Behavior

Expected output:

a valid JSON string

{"output": "\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"id\": {\n      \"type\": \"string\"\n    },\n    \"username\": {\n      \"type\": \"string\"\n    },\n    \"gender\": {\n      \"type\": \"string\",\n      \"enum\": [\"male\", \"female\", \"other\"]\n    },\n    \"nickname\": {\n      \"type\": \"string\"\n    }\n  },\n  \"required\": [\"id\", \"username\", \"gender\", \"nickname\"]\n}\n", "error": ""}

❌ Actual Behavior

an invalid JSON string with backtiles:

{"output": "```json\n{\n  \"type\": \"object\",\n  \"properties\": {\n    \"id\": {\n      \"type\": \"string\"\n    },\n    \"username\": {\n      \"type\": \"string\"\n    },\n    \"gender\": {\n      \"type\": \"string\",\n      \"enum\": [\"male\", \"female\", \"other\"]\n    },\n    \"nickname\": {\n      \"type\": \"string\"\n    }\n  },\n  \"required\": [\"id\", \"username\", \"gender\", \"nickname\"]\n}\n```", "error": ""}

QuantumGhost avatar Apr 27 '25 14:04 QuantumGhost

Proposed Mitigation:

  1. Parse the JSON string returned by the models before sending it to the client.
  2. If the JSON is invalid, attempt to fix it using json_repair, then retry parsing.
  3. If the repair is successful and the repaired JSON can be parsed, return the repaired JSON to the client.
  4. If parsing still fails after the repair—indicating the JSON is severely corrupted—return an error to the client with a more meaningful error message to inform the user appropriately.

QuantumGhost avatar Apr 27 '25 14:04 QuantumGhost