OpenCRE icon indicating copy to clipboard operation
OpenCRE copied to clipboard

Make calculating of Map Analysis Faster and less resource intensive

Open northdpole opened this issue 11 months ago • 5 comments

Currently gap analysis requires 64gb of ram on a chunky server with an external neo4j cluster and an external redis in order to calculate ~10gb worth of graph shortest paths.

This crashes most commercial laptops and takes more than 24hours on a GCP medium machine.

There are many micro-optimizations we can do to make the gap analysis faster and less resource intensive such as:

  • preload the relevant subgraphs only in neo4j
  • re-use precalculated paths
  • optimize the cypher queries and the redis usage
  • for any standard pair avoid trying to calculate a path between every node of standard A and every node of standard B
  • experiment with cutting out the standards and calculating a gap analysis between relevant CREs, since we only have 400 CREs this should be much faster than calculating gaps between thousands of standard nodes.
  • Optimize the python code to not access memory repeatedly
  • Reduce the information reported as the final result etc.

northdpole avatar Feb 22 '25 15:02 northdpole

Hi @northdpole "Here are some potential optimizations to improve efficiency and reduce resource usage:"

  1. Preload Relevant Subgraphs – Load only necessary subgraphs in Neo4j instead of the entire graph to reduce memory usage.
  2. Reuse Precomputed Paths – Implement caching to store and reuse previously calculated paths, avoiding redundant computations.
  3. Optimize Cypher Queries & Redis Usage – Improve query efficiency and ensure Redis is used effectively to reduce latency.
  4. Limit Pairwise Node Calculations – Focus only on critical node pairs instead of computing paths between all nodes.

Let me know if you’d like me to explore any of these solutions in more detail !

Hardik301002 avatar Mar 12 '25 19:03 Hardik301002

Hey @Hardik301002 , i think i've tried to do several of these but i'm by no means an expert in graph dbs, if you want you can take a stab at it.

Btw, if you used an LLM to get these suggestions some details:

  1. the query is matching subgraphs already -- if you know cypher in any depth, perhaps you can optimise this further
  2. the whole gap analysis is cached when calculated once, i'm not sure how to cache precomputed paths in neo4j
  3. this is too vague to be able to give you more pointers
  4. i think this is the same as 1.

northdpole avatar Mar 14 '25 12:03 northdpole

Please assign this issue to me, i want to work on this issue further under your guidance.

Hardik301002 avatar Mar 16 '25 08:03 Hardik301002

T @Hardik301002 thanks for volunteering, issue assigned

northdpole avatar Mar 22 '25 14:03 northdpole

Hi, I’d like to work on this issue. I have experience with performance optimization and efficient data handling, and I believe I can significantly improve the speed and resource usage of the Map Analysis module. Could you please assign this issue to me? I’ll make sure to document all changes and provide clear benchmarks for comparison.

sd2604 avatar Nov 12 '25 08:11 sd2604