easybuild-framework icon indicating copy to clipboard operation
easybuild-framework copied to clipboard

Extract dependency graph for package in stack

Open ocaisa opened this issue 6 years ago • 6 comments

This is required for building a depgraph for an entire installation tree

ocaisa avatar Feb 22 '19 16:02 ocaisa

As discussed in #2783, this forms the basis for an alternative approach where we map all the easyconfigs in the software directory to a graph with (something like):

mkdir temp
cd temp
cp $EASYBUILD_INSTALLPATH/software/*/*/easybuild/*.eb .
eb --robot=./ --dep-graph=depgraph.dot *.eb

This results in a depgraph for the entire installed software stack. The resulting file is too huge to deal with directly so I wrote a script that can extract all the software that depends (directly or indirectly) on a certain module:

find_children () {
 # The fact that we put the semicolon straight after means we exclude anywhere the module
 # is only required as a build dependency (since these cases have extra formatting the
 # semicolon comes later)
 grep ' "'$3'";' $1 >> $2
 grep ' "'$3'";' $1 | awk '{print $1}'| xargs -i bash -c "map_dep $1 $2 {}" 
}

map_dep () {
 # Once we support coloring of modules to indicate whether or not something is installed
 # this will need to be updated to be conditional on whether the module *is* installed
 echo \"$3\"\; >> $2
 find_children $1 $2 $3
}

export -f find_children
export -f map_dep

echo digraph graphname \{ > $2
map_dep $1 $2 $3
echo \} >> $2

and makes a specific dot file for that software. You call it with

./<script> <input dot file> <output dot file> "Compiler/GCCcore/7.3.0/h5py/2.8.0-serial-Python-3.6.6"

ocaisa avatar Feb 22 '19 16:02 ocaisa

I made a set of updates to the script that leverages this:

find_children () {
 # The fact that we put the semicolon straight after means we exclude anywhere the module
 # is only required as a build dependency (since these cases have extra formatting the
 # semicolon comes later);
 # the sed command makes sure we ignore whether the module is hidden or not
 grep ' "'$3'";' $1 | sed s#/\\.#/#g >> $2
 grep ' "'$3'";' $1 | awk '{print $1}'| xargs -i bash -c "map_dep $1 $2 {}" 
}

map_dep () {
 # Search for a node that matches the string and add a comment marker at the end
 # to leverage with grep later
 # (if non-installed software had additional formatting they would be ignored)
 # the sed command makes sure we ignore whether the module is hidden or not
 grep "^"'"'$3'";' $1 | awk '{print $1" // xxnodexx"}' | sed s#/\\.#/#g >> $2
 if [ $? ]
 then
   find_children $1 $2 $3
 fi
}

export -f find_children
export -f map_dep

# Check command line
if [ "$#" -ne 3 ]; then
    echo -e "Expected command line:\n\t<script> <input dot file> <output dot file> <node to search for>"
    exit 1
fi

# Begin digraph in output file
echo digraph graphname \{ > $2

# Use a temporary file to store nodes and edges
> temp.dot
# Gather nodes and edges related to $3
map_dep $1 temp.dot $3

# There is potential duplication due to nodes being followed multiple times so let's
# remove it:
# put nodes first (we used a marker to identify them), make sure they are unique
cat temp.dot | sort | uniq | grep xxnodexx >> $2
# Then edges, also make sure they are unique
cat temp.dot | sort | uniq | grep -v xxnodexx >> $2

# Clean up
rm temp.dot
echo \} >> $2

ocaisa avatar Feb 26 '19 10:02 ocaisa

Here are the timings to generate the graph for the current set of 2018b easyconfigs:

alanc@alanc-VirtualBox:~$ time eb  --dep-graph=depgraph.dot easybuild-easyconfigs/easybuild/easyconfigs/*/*/*2018b*.eb
== temporary log file in case of crash /tmp/eb-YPRS0G/easybuild-_A6fNF.log
Wrote dependency graph for 533 easyconfigs to depgraph.dot

real	0m29.663s
user	0m28.644s
sys	0m1.003s

ocaisa avatar Feb 26 '19 10:02 ocaisa

LGTM. But I think the script is where the real usefulness of this arises. Does it make sense to include it in https://github.com/easybuilders/easybuild-framework/tree/master/easybuild/scripts ?

damianam avatar May 28 '19 17:05 damianam

@ocaisa Can we come up with a test that verifies that we allow pre-existing edges now?

boegel avatar Sep 10 '19 17:09 boegel

I tried but I can't create a trigger this (at least based on the test easyconfigs). I'll take another look again soon

ocaisa avatar Sep 13 '19 12:09 ocaisa