Graph-R1 icon indicating copy to clipboard operation
Graph-R1 copied to clipboard

How are the hypergraph-related JSON files generated?

Open luckyyangrun opened this issue 5 months ago • 3 comments

Hi, thank you for sharing this great project!

I’m currently studying the codebase and noticed that the script relies on the following JSON files:

kv_store_text_chunks.json

kv_store_entities.json

kv_store_hyperedges.json

Could you kindly clarify:

How are these JSON files generated from the original dataset?

Is there any existing script or code reference for preprocessing the raw data into these formats?

Any guidance or pointers would be greatly appreciated. Thanks in advance!

luckyyangrun avatar Aug 07 '25 17:08 luckyyangrun

Image Hi! This step constructs these files.

LHRLAB avatar Aug 07 '25 17:08 LHRLAB

Image Hi! This step constructs these files.

Thanks for your reply!

After reading through script_build.py, I believe this script is responsible for reading the following preprocessed files:

kv_store_text_chunks.json

kv_store_entities.json

kv_store_hyperedges.json

It then performs embedding and FAISS indexing on the contents. However, it doesn't seem to contain logic for generating these files from the original dataset.

luckyyangrun avatar Aug 07 '25 17:08 luckyyangrun

The script reads these files and inserts newly constructed knowledge into them. However, if these files are not present at the beginning, the script will automatically construct them from the original dataset.

LHRLAB avatar Aug 07 '25 17:08 LHRLAB