Can't seem to get this to run the commands in parallel even when using --par x
Seems that using --par 3 with --host "host1 host2 host3" still causes the command to be run on each host until completion before moving onto the next. I tested with command "date; sleep 120" and the dates returned are always 2 mins apart.
The --host option is an append type, so you have to specify multiple hosts like this:
$ pssh --par 3 --host host1 --host host2 --host host3
Try that and let me know if it works for you.
You might want to try the -h/--hosts option instead, might be easier for you. It allows you to specify a newline-separate host list, e.g.:
$ pssh --par 3 --hosts prod-hosts.txt
I can confirm that passing -h or --hosts with a valid newline separated host file with 11 hosts and passing -p 11 or --par=11 or --par 11 and no p parameter and all combinations of those results in the same thing using sleep 2 && date.
The command is run against all the hosts specified in the newline separated host file but it is sshing one after the other, not in parallel, used -d as well and that surfaced nothing more than the DEBUG:psshlib.cli:script envelope is line.
Running latest 3.3.0 - commitref 6ff05ca9b07e76ac4ab6e7bd8cc57d4dca30ab63 and install is fine.
Limited -p?
Playing with -p values seems to suggest:
a) a limit to how many parallels can be handled; and/or b) some blocking in pssh is taking place.
Testing determined that -p 2,3 and 4 seem to work best there is a number of results that return in the same time (e.g. parallel).
ps aux shows pssh is opening ssh connections to each host however if -p > 10 then they all return one after the other (not parallel).
What is interesting is that the last ssh to be created reports back success first and the first ssh created reports success last - so they are blocking each other?
Examples
script file has sleep 5 && date trying on 11 hosts but with -p 30 (anything > than 10 seems to have the same effect - one at a time)
[root@controller-5-96g-luk1 ~] /usr/bin/pssh -p 30 -h /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.hosts --user=root -x "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key" --script /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.command.file
[1] 12:22:31 [SUCCESS] host-8-20g-doa2
[2] 12:22:36 [SUCCESS] host-12-30g-doa2
[3] 12:22:41 [SUCCESS] host-4-40g-doa
[4] 12:22:46 [SUCCESS] host-1-30g-doa
[5] 12:22:51 [SUCCESS] host-6-30g-doa
[6] 12:22:56 [SUCCESS] host-13-40g-doa
[7] 12:23:01 [SUCCESS] host-9-30g-doa
[8] 12:23:06 [SUCCESS] host-5-20g-doa2
[9] 12:23:11 [SUCCESS] host-11-30g-doa2
[10] 12:23:16 [SUCCESS] host-10-30g-doa2
[11] 12:23:21 [SUCCESS] host-7-30g-doa
[root@controller-5-96g-luk1 ~]
ssh connections opened during the above run
[root@controller-5-96g-luk1 ~] ps aux | grep "ssh host\-"
root 12850 0.5 0.0 62360 3724 ? Ss 12:22 0:00 ssh host-7-30g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12851 0.5 0.0 62360 3724 ? Ss 12:22 0:00 ssh host-10-30g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12852 0.0 0.0 62360 3728 ? Ss 12:22 0:00 ssh host-11-30g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12853 0.5 0.0 62360 3724 ? Ss 12:22 0:00 ssh host-5-20g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12854 0.5 0.0 62360 3724 ? Ss 12:22 0:00 ssh host-9-30g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12855 0.5 0.0 62360 3724 ? Ss 12:22 0:00 ssh host-13-40g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12856 0.5 0.0 62360 3724 ? Ss 12:22 0:00 ssh host-6-30g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12857 0.5 0.0 62360 3724 ? Ss 12:22 0:00 ssh host-1-30g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12858 0.5 0.0 62360 3724 ? Ss 12:22 0:00 ssh host-4-40g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12859 0.0 0.0 62360 3728 ? Ss 12:22 0:00 ssh host-12-30g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12860 0.0 0.0 62360 3728 ? Ss 12:22 0:00 ssh host-8-20g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root 12868 0.0 0.0 103300 836 pts/2 S+ 12:22 0:00 grep ssh host-
[root@controller-5-96g-luk1 ~]
Running with -p 3 noting the times that the hosts are reporting back shows parallelism
[root@host-controller-dev-5-96g-luk1 ~] /usr/bin/pssh -p 3 -h /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.hosts --user=root -x "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key" --script /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.command.file
[1] 12:19:23 [SUCCESS] host-11-30g-doa2
[2] 12:19:28 [SUCCESS] host-10-30g-doa2
[3] 12:19:33 [SUCCESS] host-7-30g-doa
[4] 12:19:33 [SUCCESS] host-5-20g-doa2
[5] 12:19:33 [SUCCESS] host-9-30g-doa
[6] 12:19:38 [SUCCESS] host-13-40g-doa
[7] 12:19:39 [SUCCESS] host-6-30g-doa
[8] 12:19:39 [SUCCESS] host-1-30g-doa
[9] 12:19:44 [SUCCESS] host-4-40g-doa
[10] 12:19:44 [SUCCESS] host-12-30g-doa2
[11] 12:19:44 [SUCCESS] host-8-20g-doa2
[root@host-controller-dev-5-96g-luk1 ~]
Running with -p 4 suggests a limit of 3
[root@controller-5-96g-luk1 ~] /usr/bin/pssh -p 4 -h /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.hosts --user=root -x "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key" --script /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.command.file
[1] 12:31:11 [SUCCESS] host-5-20g-doa2
[2] 12:31:16 [SUCCESS] host-11-30g-doa2
[3] 12:31:16 [SUCCESS] host-9-30g-doa
[4] 12:31:21 [SUCCESS] host-10-30g-doa2
[5] 12:31:21 [SUCCESS] host-13-40g-doa
[6] 12:31:22 [SUCCESS] host-6-30g-doa
[7] 12:31:26 [SUCCESS] host-7-30g-doa
[8] 12:31:26 [SUCCESS] host-1-30g-doa
[9] 12:31:27 [SUCCESS] host-4-40g-doa
[10] 12:31:27 [SUCCESS] host-12-30g-doa2
[11] 12:31:31 [SUCCESS] host-8-20g-doa2
[root@controller-5-96g-luk1 ~]
Running with -p 6 the same.
[root@controller-5-96g-luk1 ~] /usr/bin/pssh -p 6 -h /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.hosts --user=root -x "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key" --script /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.command.file
[1] 12:32:33 [SUCCESS] host-13-40g-doa
[2] 12:32:38 [SUCCESS] host-9-30g-doa
[3] 12:32:38 [SUCCESS] host-6-30g-doa
[4] 12:32:43 [SUCCESS] host-5-20g-doa2
[5] 12:32:43 [SUCCESS] host-1-30g-doa
[6] 12:32:44 [SUCCESS] host-4-40g-doa
[7] 12:32:48 [SUCCESS] host-11-30g-doa2
[8] 12:32:48 [SUCCESS] host-12-30g-doa2
[9] 12:32:49 [SUCCESS] host-8-20g-doa2
[10] 12:32:53 [SUCCESS] host-10-30g-doa2
[11] 12:32:58 [SUCCESS] host-7-30g-doa
[root@controller-5-96g-luk1 ~]
This is being run on a 4 processor server and does not seem processor related.
There are two issues:
- Parallelism of the SSH execution
- Parallelism of the input/output handler
Your ps shows that the SSH processes are all running in parallel. So the issue is not there, it's likely in the second piece.
My initial guess is that --script is the culprit here. Not only does additional data have to be passed through STDIN of the SSH session, but it needs to write a file to the remote end and launch a brand new shell. This is causing arbitrary changes in the time it takes for a particular task to complete. The timestamps returned by pssh only show when the task completed, not when it started.
I would try running without --script, e.g. pssh -h hostlist.txt date.
On second thought, I think there might be something else wrong with your environment. I just ran using the --script method across 500 hosts using a -p 15 and on average got 20-25 results per second interval (which means the SSH sessions are finishing faster than pssh polls for their output).
My client is a CentOS 6 host.
Hi Jon
Yes I also had a feeling it may be something environment related as well, not sure what though.
However, thank you for the feedback, it has focused more testing.
Testing on our environments indeed does suggests that it has something to do with the --script method, when a multiple line script is passed. I could reproduced --script seeming to run in parallel if the script that was passed has one simple command (line), however if the script has multiple lines then it once again appears that it is run on no more that 3 hosts at a time, as long as -p < 10.
What is really quite interesting is if I pass the same multiple lines as declared in the script as a single command && separated, it does run in parallel.
You mentioned that timestamps returned by pssh only show when the task completed, not when it started, preceeding and following the command/s with date does show us what time the first date command was executed and the time it was finished.
For example using --script /tmp/pssh.test.sh where /tmp/pssh.test.sh contains:
date
puppetd
date
Results it the execution only running parallel on 3 or 4 hosts and once those have finished, the next 3 or 4.
However not passing the --script argument but rather just passing the command as:
"date && puppetd && date"
This executes the commands across all the hosts in parallel as desired - the date commands are all issued at the same time. This is a workaround that works for us.
It may be interesting to see if you can replicate this behaviour on your environments, by using a script file that has multiple commands wrapped in date.
Could I ask what Python version and OS you're using ?
Hi Jon
CentOS release 6.5 (Final) and Python 2.6.6
Do you have anything complex happening in your profile.d scripts or personal .bashrc, etc.?
It is possiblle, yes we have do have some complex(-ish) profile.d stuff going on. I shall endeavour build a pure CentOS node and verify. This is probably an edge case of ours in some way, very happy && works around, we now have pssh, thank you for the efforts :)