OpenCRE icon indicating copy to clipboard operation
OpenCRE copied to clipboard

Add Indexing Suggestions or Documentation for Slow Neo4j Queries

Open ashutosh-engineer opened this issue 5 months ago • 3 comments

Problem

While working on the AI mapping pipeline, I noticed that some Neo4j queries (especially involving relationships like (:Standard)-[:MAPPED_TO]->(:CRE)) become significantly slower on larger datasets.

Currently, the repository does not document:

  • Which properties should be indexed in the Neo4j database.
  • Any query patterns that should be avoided for performance reasons.
  • Indexing strategies used in production (if any).

This can lead to performance bottlenecks, especially when using CALL db.indexes() shows minimal or no usage of compound indexes.


Proposal

We can:

  • Add a basic docs/neo4j-indexing.md file listing:

    • Recommended indexes (e.g., on Standard.external_id, CRE.id, CRE.name)
    • Optional compound indexes for common query paths
    • Tips on avoiding slow patterns (e.g., avoid expanding entire graph via MATCH (n)-[*]->(m))
  • Log and benchmark existing slow queries using PROFILE/EXPLAIN and annotate them in the AI mapping script as comments or docstrings.


Benefits

  • Helps contributors understand how to structure performant queries.
  • Ensures production setups follow best indexing practices.
  • Makes AI mapping and CRE-Standard linking significantly faster.

Optional: Related Queries

For example:

MATCH (s:Standard)-[:MAPPED_TO]->(c:CRE) WHERE s.external_id = $id RETURN c

ashutosh-engineer avatar Jul 30 '25 09:07 ashutosh-engineer