Lucian Anton
Lucian Anton
I got the following, obviously an incomplete list of hosts ``` $ cat ~/stat_results/ifsMASTER.sp.x.0016/*top ac1-1001.bullx:0 => ac1-1001:0 ac1-1002:0 ac1-1003:0 ac1-1006:0 ac1-1007:0 ac1-1008:0 ac1-1009:0 ac1-1012:0 ac1-1013:0 ac1-1014:0 ac1-1016:0 ac1-1017:0 ac1-1018:0 ac1-1019:0...
I think so. We have discussed this a while back in https://github.com/LLNL/STAT/issues/33#issuecomment-919433850
depth 1 option works (no crash) but the stat-cl stays for ever in "Sampling traces..." when I try to sample a 880 nodes x 8 MPI/node job. I have a...
Up to 30 minutes or so. If I use a very simple test code (a few lines) I can get a stack trace in a couple of minutes even at...
I have run it with 300 nodes, but still it takes a long time. I think that you are right about STATD being stuck waiting for the debug info from...
I have run stac-cl with -l BE -l SW -l CP on the case that hangs ( 880 nodes x 8 MPI/node) I have noticed that after some progress reported...
Perhaps this helps, internally in our cluster the short node names are enough for comms. So if you can get them with something like `hostname -s` it should be a...
I'm a bit confused, what exactly should be in STAT_FE_HOSTNAME? The name of the head node, or hostname pattern, or ...?
Yes, it does. I'll test this when I get some free nodes on the system. Hopefully this evening or tomorrow morning.
With STAT_PE_HOSTNAME defined and without -d option stat-cl fails quickly with this sort of message ( see below). Note STATD reports that it has a -d 1 flag although I...