OpenCRE icon indicating copy to clipboard operation
OpenCRE copied to clipboard

Prune map analysis search to save time and memory

Open robvanderveer opened this issue 1 year ago • 4 comments

Map analysis takes days to calculate and gigabytes to store, but we're only interested in the top matches from standard to standard. It's no use to find 100 weakly linked topics if you have a couple of direct links and a number of strong links. The idea therefore is to prune the search we do for map analysis, by stopping search early if we know that we're looking

  1. design a strategy for pruning. First stab: don't search for average and weak links if we have direct and strong links. Don't search for weak links if we have direct, strong or average links'.
  2. update the search algorithm. This will require some smartness, depending on how it is implemented.

By the way, I notice we don't seem to have weak links anymore: should we change our categorization and make 3-6 average and 7+ weak?

First step: find expertise in Neo4J. Rob

robvanderveer avatar May 16 '24 11:05 robvanderveer

By the way, I notice we don't seem to have weak links anymore: should we change our categorization and make 3-6 average and 7+ weak?

The query is the following:

OPTIONAL MATCH (BaseStandard:NeoStandard {name: $name1}) OPTIONAL MATCH (CompareStandard:NeoStandard {name: $name2}) OPTIONAL MATCH p = allShortestPaths((BaseStandard)-[*..20]-(CompareStandard)) WITH p WHERE length(p) > 1 AND ALL(n in NODES(p) WHERE (n:NeoCRE or n = BaseStandard or n = CompareStandard) AND NOT n.name in $denylist) RETURN p followed by

OPTIONAL MATCH (BaseStandard:NeoStandard {name: $name1}) OPTIONAL MATCH (CompareStandard:NeoStandard {name: $name2}) OPTIONAL MATCH p = allShortestPaths((BaseStandard)-[:(LINKED_TO|CONTAINS)*..20]-(CompareStandard)) WITH p WHERE length(p) > 1 AND ALL(n in NODES(p) WHERE (n:NeoCRE or n = BaseStandard or n = CompareStandard) AND NOT n.name in $denylist) RETURN p

and then we filter and assign weights in code we can start by optimizing these pretty brute-forcy queries

northdpole avatar May 17 '24 08:05 northdpole

Hi! I'd like to take this up as my next contribution.

This will be my 2nd contribution to OWASP/OpenCRE, and I’m especially interested in this issue as it’s tagged for GSoC 2025. I’ll work on designing a pruning strategy to optimize the graph search and improve performance.

I’m also aiming to contribute more towards GSoC 2026 with this organization.

Looking forward to solving this!

ashutosh-engineer avatar Jul 23 '25 17:07 ashutosh-engineer

Hey if this issue is still available could you pls assign this to me...I'd really want to work on this issue

sd2604 avatar Nov 12 '25 08:11 sd2604

Hey, sure, this is about optimizin gap analysis. What do you think you will do to achieve this?

northdpole avatar Nov 17 '25 07:11 northdpole