Prune map analysis search to save time and memory
Map analysis takes days to calculate and gigabytes to store, but we're only interested in the top matches from standard to standard. It's no use to find 100 weakly linked topics if you have a couple of direct links and a number of strong links. The idea therefore is to prune the search we do for map analysis, by stopping search early if we know that we're looking
- design a strategy for pruning. First stab: don't search for average and weak links if we have direct and strong links. Don't search for weak links if we have direct, strong or average links'.
- update the search algorithm. This will require some smartness, depending on how it is implemented.
By the way, I notice we don't seem to have weak links anymore: should we change our categorization and make 3-6 average and 7+ weak?
First step: find expertise in Neo4J. Rob
By the way, I notice we don't seem to have weak links anymore: should we change our categorization and make 3-6 average and 7+ weak?
The query is the following:
OPTIONAL MATCH (BaseStandard:NeoStandard {name: $name1}) OPTIONAL MATCH (CompareStandard:NeoStandard {name: $name2}) OPTIONAL MATCH p = allShortestPaths((BaseStandard)-[*..20]-(CompareStandard)) WITH p WHERE length(p) > 1 AND ALL(n in NODES(p) WHERE (n:NeoCRE or n = BaseStandard or n = CompareStandard) AND NOT n.name in $denylist) RETURN p
followed by
OPTIONAL MATCH (BaseStandard:NeoStandard {name: $name1}) OPTIONAL MATCH (CompareStandard:NeoStandard {name: $name2}) OPTIONAL MATCH p = allShortestPaths((BaseStandard)-[:(LINKED_TO|CONTAINS)*..20]-(CompareStandard)) WITH p WHERE length(p) > 1 AND ALL(n in NODES(p) WHERE (n:NeoCRE or n = BaseStandard or n = CompareStandard) AND NOT n.name in $denylist) RETURN p
and then we filter and assign weights in code we can start by optimizing these pretty brute-forcy queries
Hi! I'd like to take this up as my next contribution.
This will be my 2nd contribution to OWASP/OpenCRE, and I’m especially interested in this issue as it’s tagged for GSoC 2025. I’ll work on designing a pruning strategy to optimize the graph search and improve performance.
I’m also aiming to contribute more towards GSoC 2026 with this organization.
Looking forward to solving this!
Hey if this issue is still available could you pls assign this to me...I'd really want to work on this issue
Hey, sure, this is about optimizin gap analysis. What do you think you will do to achieve this?