graphrag icon indicating copy to clipboard operation
graphrag copied to clipboard

JSON parsing: always fix all incoming json when using _manual_json

Open MarkJGx opened this issue 1 year ago • 2 comments

Description

This change set adds non-LLM-based JSON malformity handling as a preliminary step before using the more resource-intensive LLM-based fixup.

More Details

While running GraphRAG with a local Ollama model, I noticed frequent malformed JSON responses from LLM requests, significantly slowing down the process on an M1 Max MacBook. In a fast, parallel cloud inference system, this issue is manageable, but locally it becomes a bottleneck. After indexing, I found 140 instances of JSON parsing failures.

The json_repair library effectively fixed the malformed JSON in my tests. I opted not to delve into the specific parsing failure cases, as they are mainly LLM-related and predicting every edge case is impractical. This library should be robust enough to handle most local LLM faults.

Related Issues

  • https://github.com/microsoft/graphrag/issues/345

Proposed Changes

  • Add json_repair as a new Poetry dependency.
    • https://pypi.org/project/json-repair/
  • Use json_repair for initial JSON repair in _manual_json, with a fallback to the LLM.
  • Apply JSON repair when graph search JSON parsing fails.

Checklist

  • [x] Tested these changes locally.
  • [x] Reviewed the code changes.
  • [x] Updated documentation (if necessary).
  • [ ] Added appropriate unit tests (if applicable).

MarkJGx avatar Jul 14 '24 13:07 MarkJGx

@microsoft-github-policy-service agree

MarkJGx avatar Jul 14 '24 13:07 MarkJGx

Hey. I've addressed the oversight caught in the comment and rebased on main, while rehashing the poetry lock with the new library added. Waiting for a review. @AlonsoGuevara @jgbradley1 et al.

MarkJGx avatar Jul 28 '24 21:07 MarkJGx

We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.

natoverse avatar Aug 09 '24 17:08 natoverse