OpenCRE
OpenCRE copied to clipboard
Add Indexing Suggestions or Documentation for Slow Neo4j Queries
Problem
While working on the AI mapping pipeline, I noticed that some Neo4j queries (especially involving relationships like (:Standard)-[:MAPPED_TO]->(:CRE)) become significantly slower on larger datasets.
Currently, the repository does not document:
- Which properties should be indexed in the Neo4j database.
- Any query patterns that should be avoided for performance reasons.
- Indexing strategies used in production (if any).
This can lead to performance bottlenecks, especially when using CALL db.indexes() shows minimal or no usage of compound indexes.
Proposal
We can:
-
Add a basic
docs/neo4j-indexing.mdfile listing:- Recommended indexes (e.g., on
Standard.external_id,CRE.id,CRE.name) - Optional compound indexes for common query paths
- Tips on avoiding slow patterns (e.g., avoid expanding entire graph via
MATCH (n)-[*]->(m))
- Recommended indexes (e.g., on
-
Log and benchmark existing slow queries using
PROFILE/EXPLAINand annotate them in the AI mapping script as comments or docstrings.
Benefits
- Helps contributors understand how to structure performant queries.
- Ensures production setups follow best indexing practices.
- Makes AI mapping and CRE-Standard linking significantly faster.
Optional: Related Queries
For example:
MATCH (s:Standard)-[:MAPPED_TO]->(c:CRE) WHERE s.external_id = $id RETURN c