parallel-ssh icon indicating copy to clipboard operation
parallel-ssh copied to clipboard

Can't seem to get this to run the commands in parallel even when using --par x

Open alistar79 opened this issue 11 years ago • 10 comments

Seems that using --par 3 with --host "host1 host2 host3" still causes the command to be run on each host until completion before moving onto the next. I tested with command "date; sleep 120" and the dates returned are always 2 mins apart.

alistar79 avatar Jul 17 '14 17:07 alistar79

The --host option is an append type, so you have to specify multiple hosts like this:

$ pssh --par 3 --host host1 --host host2 --host host3

Try that and let me know if it works for you.

You might want to try the -h/--hosts option instead, might be easier for you. It allows you to specify a newline-separate host list, e.g.:

$ pssh --par 3 --hosts prod-hosts.txt

jcmcken avatar Jul 26 '14 01:07 jcmcken

I can confirm that passing -h or --hosts with a valid newline separated host file with 11 hosts and passing -p 11 or --par=11 or --par 11 and no p parameter and all combinations of those results in the same thing using sleep 2 && date.

The command is run against all the hosts specified in the newline separated host file but it is sshing one after the other, not in parallel, used -d as well and that surfaced nothing more than the DEBUG:psshlib.cli:script envelope is line.

Running latest 3.3.0 - commitref 6ff05ca9b07e76ac4ab6e7bd8cc57d4dca30ab63 and install is fine.

earthgecko avatar Oct 01 '14 11:10 earthgecko

Limited -p?

Playing with -p values seems to suggest:

a) a limit to how many parallels can be handled; and/or b) some blocking in pssh is taking place.

Testing determined that -p 2,3 and 4 seem to work best there is a number of results that return in the same time (e.g. parallel).

ps aux shows pssh is opening ssh connections to each host however if -p > 10 then they all return one after the other (not parallel).

What is interesting is that the last ssh to be created reports back success first and the first ssh created reports success last - so they are blocking each other?

Examples

script file has sleep 5 && date trying on 11 hosts but with -p 30 (anything > than 10 seems to have the same effect - one at a time)

[root@controller-5-96g-luk1 ~] /usr/bin/pssh -p 30 -h /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.hosts --user=root -x "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key" --script /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.command.file
[1] 12:22:31 [SUCCESS] host-8-20g-doa2
[2] 12:22:36 [SUCCESS] host-12-30g-doa2
[3] 12:22:41 [SUCCESS] host-4-40g-doa
[4] 12:22:46 [SUCCESS] host-1-30g-doa
[5] 12:22:51 [SUCCESS] host-6-30g-doa
[6] 12:22:56 [SUCCESS] host-13-40g-doa
[7] 12:23:01 [SUCCESS] host-9-30g-doa
[8] 12:23:06 [SUCCESS] host-5-20g-doa2
[9] 12:23:11 [SUCCESS] host-11-30g-doa2
[10] 12:23:16 [SUCCESS] host-10-30g-doa2
[11] 12:23:21 [SUCCESS] host-7-30g-doa

[root@controller-5-96g-luk1 ~]

ssh connections opened during the above run

[root@controller-5-96g-luk1 ~] ps aux | grep "ssh host\-" 
root     12850  0.5  0.0  62360  3724 ?        Ss   12:22   0:00 ssh host-7-30g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12851  0.5  0.0  62360  3724 ?        Ss   12:22   0:00 ssh host-10-30g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12852  0.0  0.0  62360  3728 ?        Ss   12:22   0:00 ssh host-11-30g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12853  0.5  0.0  62360  3724 ?        Ss   12:22   0:00 ssh host-5-20g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12854  0.5  0.0  62360  3724 ?        Ss   12:22   0:00 ssh host-9-30g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12855  0.5  0.0  62360  3724 ?        Ss   12:22   0:00 ssh host-13-40g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12856  0.5  0.0  62360  3724 ?        Ss   12:22   0:00 ssh host-6-30g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12857  0.5  0.0  62360  3724 ?        Ss   12:22   0:00 ssh host-1-30g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12858  0.5  0.0  62360  3724 ?        Ss   12:22   0:00 ssh host-4-40g-doa -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12859  0.0  0.0  62360  3728 ?        Ss   12:22   0:00 ssh host-12-30g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12860  0.0  0.0  62360  3728 ?        Ss   12:22   0:00 ssh host-8-20g-doa2 -o NumberOfPasswordPrompts=1 -o SendEnv=PSSH_NODENUM -l root -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key cat > /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; CATRET=$?; chmod 700 /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210 ; RET=$((CATRET+$?));rm -f /tmp/pssh-9207a59e833cffe21b272dfebbd5362ffb296210; exit $RET
root     12868  0.0  0.0 103300   836 pts/2    S+   12:22   0:00 grep ssh host-
[root@controller-5-96g-luk1 ~]

Running with -p 3 noting the times that the hosts are reporting back shows parallelism

[root@host-controller-dev-5-96g-luk1 ~] /usr/bin/pssh -p 3 -h /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.hosts --user=root -x "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key" --script /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.command.file
[1] 12:19:23 [SUCCESS] host-11-30g-doa2
[2] 12:19:28 [SUCCESS] host-10-30g-doa2
[3] 12:19:33 [SUCCESS] host-7-30g-doa
[4] 12:19:33 [SUCCESS] host-5-20g-doa2
[5] 12:19:33 [SUCCESS] host-9-30g-doa
[6] 12:19:38 [SUCCESS] host-13-40g-doa
[7] 12:19:39 [SUCCESS] host-6-30g-doa
[8] 12:19:39 [SUCCESS] host-1-30g-doa
[9] 12:19:44 [SUCCESS] host-4-40g-doa
[10] 12:19:44 [SUCCESS] host-12-30g-doa2
[11] 12:19:44 [SUCCESS] host-8-20g-doa2

[root@host-controller-dev-5-96g-luk1 ~]

Running with -p 4 suggests a limit of 3

[root@controller-5-96g-luk1 ~] /usr/bin/pssh -p 4 -h /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.hosts --user=root -x "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key" --script /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.command.file
[1] 12:31:11 [SUCCESS] host-5-20g-doa2
[2] 12:31:16 [SUCCESS] host-11-30g-doa2
[3] 12:31:16 [SUCCESS] host-9-30g-doa
[4] 12:31:21 [SUCCESS] host-10-30g-doa2
[5] 12:31:21 [SUCCESS] host-13-40g-doa
[6] 12:31:22 [SUCCESS] host-6-30g-doa
[7] 12:31:26 [SUCCESS] host-7-30g-doa
[8] 12:31:26 [SUCCESS] host-1-30g-doa
[9] 12:31:27 [SUCCESS] host-4-40g-doa
[10] 12:31:27 [SUCCESS] host-12-30g-doa2
[11] 12:31:31 [SUCCESS] host-8-20g-doa2

[root@controller-5-96g-luk1 ~]

Running with -p 6 the same.

[root@controller-5-96g-luk1 ~] /usr/bin/pssh -p 6 -h /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.hosts --user=root -x "-o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/.known_hosts -i /root/.ssh/controller.key" --script /var/log/host/scripts/host.pssh.sh/2014/10/20141001105803.host.pssh.sh.log.command.file
[1] 12:32:33 [SUCCESS] host-13-40g-doa
[2] 12:32:38 [SUCCESS] host-9-30g-doa
[3] 12:32:38 [SUCCESS] host-6-30g-doa
[4] 12:32:43 [SUCCESS] host-5-20g-doa2
[5] 12:32:43 [SUCCESS] host-1-30g-doa
[6] 12:32:44 [SUCCESS] host-4-40g-doa
[7] 12:32:48 [SUCCESS] host-11-30g-doa2
[8] 12:32:48 [SUCCESS] host-12-30g-doa2
[9] 12:32:49 [SUCCESS] host-8-20g-doa2
[10] 12:32:53 [SUCCESS] host-10-30g-doa2
[11] 12:32:58 [SUCCESS] host-7-30g-doa

[root@controller-5-96g-luk1 ~]

This is being run on a 4 processor server and does not seem processor related.

earthgecko avatar Oct 01 '14 12:10 earthgecko

There are two issues:

  • Parallelism of the SSH execution
  • Parallelism of the input/output handler

Your ps shows that the SSH processes are all running in parallel. So the issue is not there, it's likely in the second piece.

My initial guess is that --script is the culprit here. Not only does additional data have to be passed through STDIN of the SSH session, but it needs to write a file to the remote end and launch a brand new shell. This is causing arbitrary changes in the time it takes for a particular task to complete. The timestamps returned by pssh only show when the task completed, not when it started.

I would try running without --script, e.g. pssh -h hostlist.txt date.

jcmcken avatar Oct 03 '14 21:10 jcmcken

On second thought, I think there might be something else wrong with your environment. I just ran using the --script method across 500 hosts using a -p 15 and on average got 20-25 results per second interval (which means the SSH sessions are finishing faster than pssh polls for their output).

My client is a CentOS 6 host.

jcmcken avatar Oct 03 '14 21:10 jcmcken

Hi Jon

Yes I also had a feeling it may be something environment related as well, not sure what though.
However, thank you for the feedback, it has focused more testing.

Testing on our environments indeed does suggests that it has something to do with the --script method, when a multiple line script is passed. I could reproduced --script seeming to run in parallel if the script that was passed has one simple command (line), however if the script has multiple lines then it once again appears that it is run on no more that 3 hosts at a time, as long as -p < 10.

What is really quite interesting is if I pass the same multiple lines as declared in the script as a single command && separated, it does run in parallel.

You mentioned that timestamps returned by pssh only show when the task completed, not when it started, preceeding and following the command/s with date does show us what time the first date command was executed and the time it was finished.

For example using --script /tmp/pssh.test.sh where /tmp/pssh.test.sh contains:

date
puppetd
date

Results it the execution only running parallel on 3 or 4 hosts and once those have finished, the next 3 or 4.

However not passing the --script argument but rather just passing the command as:

"date && puppetd && date"

This executes the commands across all the hosts in parallel as desired - the date commands are all issued at the same time. This is a workaround that works for us.

It may be interesting to see if you can replicate this behaviour on your environments, by using a script file that has multiple commands wrapped in date.

earthgecko avatar Oct 06 '14 12:10 earthgecko

Could I ask what Python version and OS you're using ?

jcmcken avatar Oct 09 '14 20:10 jcmcken

Hi Jon

CentOS release 6.5 (Final) and Python 2.6.6

earthgecko avatar Oct 09 '14 21:10 earthgecko

Do you have anything complex happening in your profile.d scripts or personal .bashrc, etc.?

jcmcken avatar Oct 11 '14 15:10 jcmcken

It is possiblle, yes we have do have some complex(-ish) profile.d stuff going on. I shall endeavour build a pure CentOS node and verify. This is probably an edge case of ours in some way, very happy && works around, we now have pssh, thank you for the efforts :)

earthgecko avatar Oct 11 '14 17:10 earthgecko