Graph-R1 How are the hypergraph-related JSON files generated?

Hi, thank you for sharing this great project!

I’m currently studying the codebase and noticed that the script relies on the following JSON files:

kv_store_text_chunks.json

kv_store_entities.json

kv_store_hyperedges.json

Could you kindly clarify:

How are these JSON files generated from the original dataset?

Is there any existing script or code reference for preprocessing the raw data into these formats?

Any guidance or pointers would be greatly appreciated. Thanks in advance!

Aug 07 '25 17:08 luckyyangrun

Hi! This step constructs these files.

Aug 07 '25 17:08 LHRLAB

Hi! This step constructs these files.

Thanks for your reply!

After reading through script_build.py, I believe this script is responsible for reading the following preprocessed files:

kv_store_text_chunks.json

kv_store_entities.json

kv_store_hyperedges.json

It then performs embedding and FAISS indexing on the contents. However, it doesn't seem to contain logic for generating these files from the original dataset.

Aug 07 '25 17:08 luckyyangrun

The script reads these files and inserts newly constructed knowledge into them. However, if these files are not present at the beginning, the script will automatically construct them from the original dataset.

Aug 07 '25 17:08 LHRLAB