Extract dependency graph for package in stack
This is required for building a depgraph for an entire installation tree
As discussed in #2783, this forms the basis for an alternative approach where we map all the easyconfigs in the software directory to a graph with (something like):
mkdir temp
cd temp
cp $EASYBUILD_INSTALLPATH/software/*/*/easybuild/*.eb .
eb --robot=./ --dep-graph=depgraph.dot *.eb
This results in a depgraph for the entire installed software stack. The resulting file is too huge to deal with directly so I wrote a script that can extract all the software that depends (directly or indirectly) on a certain module:
find_children () {
# The fact that we put the semicolon straight after means we exclude anywhere the module
# is only required as a build dependency (since these cases have extra formatting the
# semicolon comes later)
grep ' "'$3'";' $1 >> $2
grep ' "'$3'";' $1 | awk '{print $1}'| xargs -i bash -c "map_dep $1 $2 {}"
}
map_dep () {
# Once we support coloring of modules to indicate whether or not something is installed
# this will need to be updated to be conditional on whether the module *is* installed
echo \"$3\"\; >> $2
find_children $1 $2 $3
}
export -f find_children
export -f map_dep
echo digraph graphname \{ > $2
map_dep $1 $2 $3
echo \} >> $2
and makes a specific dot file for that software. You call it with
./<script> <input dot file> <output dot file> "Compiler/GCCcore/7.3.0/h5py/2.8.0-serial-Python-3.6.6"
I made a set of updates to the script that leverages this:
find_children () {
# The fact that we put the semicolon straight after means we exclude anywhere the module
# is only required as a build dependency (since these cases have extra formatting the
# semicolon comes later);
# the sed command makes sure we ignore whether the module is hidden or not
grep ' "'$3'";' $1 | sed s#/\\.#/#g >> $2
grep ' "'$3'";' $1 | awk '{print $1}'| xargs -i bash -c "map_dep $1 $2 {}"
}
map_dep () {
# Search for a node that matches the string and add a comment marker at the end
# to leverage with grep later
# (if non-installed software had additional formatting they would be ignored)
# the sed command makes sure we ignore whether the module is hidden or not
grep "^"'"'$3'";' $1 | awk '{print $1" // xxnodexx"}' | sed s#/\\.#/#g >> $2
if [ $? ]
then
find_children $1 $2 $3
fi
}
export -f find_children
export -f map_dep
# Check command line
if [ "$#" -ne 3 ]; then
echo -e "Expected command line:\n\t<script> <input dot file> <output dot file> <node to search for>"
exit 1
fi
# Begin digraph in output file
echo digraph graphname \{ > $2
# Use a temporary file to store nodes and edges
> temp.dot
# Gather nodes and edges related to $3
map_dep $1 temp.dot $3
# There is potential duplication due to nodes being followed multiple times so let's
# remove it:
# put nodes first (we used a marker to identify them), make sure they are unique
cat temp.dot | sort | uniq | grep xxnodexx >> $2
# Then edges, also make sure they are unique
cat temp.dot | sort | uniq | grep -v xxnodexx >> $2
# Clean up
rm temp.dot
echo \} >> $2
Here are the timings to generate the graph for the current set of 2018b easyconfigs:
alanc@alanc-VirtualBox:~$ time eb --dep-graph=depgraph.dot easybuild-easyconfigs/easybuild/easyconfigs/*/*/*2018b*.eb
== temporary log file in case of crash /tmp/eb-YPRS0G/easybuild-_A6fNF.log
Wrote dependency graph for 533 easyconfigs to depgraph.dot
real 0m29.663s
user 0m28.644s
sys 0m1.003s
LGTM. But I think the script is where the real usefulness of this arises. Does it make sense to include it in https://github.com/easybuilders/easybuild-framework/tree/master/easybuild/scripts ?
@ocaisa Can we come up with a test that verifies that we allow pre-existing edges now?
I tried but I can't create a trigger this (at least based on the test easyconfigs). I'll take another look again soon