warpgate icon indicating copy to clipboard operation
warpgate copied to clipboard

SSH Connection Hangs When Using Ansible

Open Abraxos opened this issue 3 years ago • 6 comments

Hello!

First off, I just wanna say that Warpgate is awesome, I love it, and I wanna use it to manage everything I use SSH for securely. Please keep up the good work.

Issue Summary

One of the things I wanna do with it, however, is that I provision all my servers using Ansible. If you don't know ansible, not a big deal, its not hard, its a configuration management system that comes down to "kinda idempotent python scripts over an SSH connection", the advantage being that if you have an SSH connection to a given device, you should be able to use Ansible to configure it just fine.

The problem is that when I attempt to configure a machine using Ansible which in turn uses a Warpgate-based SSH connection, I get like 5-10 tasks in (a task is a single "command" that does something in ansible), and then it either just freezes or it hits a timeout.

It took me a while to figure out as much as I did (including the steps to reproduce the issue below) and I still don't really know whether this is a Warpgate issue or an Ansible issue. I was hoping that if its an Ansible issue, you could at least help me figure out what to tell the Ansible devs.

Setup

Anyway, here's what I've determined can be done to replicate the issue...

You will need 3 machines to replicate this on. I ended up setting up 3 Linode machines to demo this on (1GB RAM, "nanonodes"). If you need access to them to test on, I will gladly grant it (they're not doing anything else). All the machines are Ubuntu 22.04 LTS. The machines are going to referred to from here on is as the host, target, and relay. The host is going to be running the various Ansible commands to configure the target by connecting through the relay which will in turn be running Warpgate.

Feel free to skip everything up until the Ansible Configuration section as all I am doing is installing ansible on the host, warpgate on the relay, configuring both, and making the target accessible from the host at ssh target directly and ssh target.warpgate via the relay.

The initial configuration for each is as follows:

warpgate-demo-host

This is the machine where we are going to run ansible to configure the target. We will be connecting to the target through the relay.

apt-get update --fix-missing -y;
apt-get upgrade -y;
apt-get dist-upgrade -y;
apt-get install emacs tmux htop python3-pip;
pip3 install --upgrade pip ansible;

Now we need to generate an SSH key for the host with ssh-keygen -t ed25519 set up a ~/.ssh/config config file on the host that has two different ways to get to the target machine:

Host target.warpgate
     Hostname <RELAY IP>
     User admin:target
     Port 2222
     IdentityFile ~/.ssh/id_ed25519

Host target
     Hostname <TARGET IP>
     User root
     Port 22
     IdentityFile ~/.ssh/id_ed25519

warpgate-demo-relay

apt-get update --fix-missing -y;
apt-get upgrade -y;
apt-get dist-upgrade -y;
apt-get install emacs tmux htop python3-pip;
pip3 install --upgrade pip;

Then we install warpgate on the relay:

wget https://github.com/warp-tech/warpgate/releases/download/v0.5.1/warpgate-v0.5.1-x86_64-linux -O /usr/local/bin/warpgate && \
chmod u=rwx,g=rx,o=rx /usr/local/bin/warpgate

run warpgate setup as per the setup instructions

... and edit the resulting /etc/warpgate.yaml file to add the target host:

---
targets:
  - name: target
    allow_roles: ["warpgate:admin"]
    ssh:
      host: <TARGET IP>
      username: root
      port: 22
  - name: Web admin
    allow_roles:
      - "warpgate:admin"
    web_admin: {}
users:
  - username: admin
    credentials:
      - type: password
        hash: <ADMIN PW HASH>
      - type: publickey
        key: <HOST PUBLIC KEY>
    roles:
      - "warpgate:admin"
roles:
  - name: "warpgate:admin"
sso_providers: []
recordings:
  enable: true
  path: /var/lib/warpgate/recordings
external_host: ~
database_url: "sqlite:/var/lib/warpgate/db"
ssh:
  enable: true
  listen: "0.0.0.0:2222"
  keys: /var/lib/warpgate/ssh-keys
  host_key_verification: prompt
http:
  enable: true
  listen: "0.0.0.0:8888"
  certificate: /var/lib/warpgate/tls.certificate.pem
  key: /var/lib/warpgate/tls.key.pem
mysql:
  enable: false
  listen: "0.0.0.0:8888"
  certificate: /var/lib/warpgate/tls.certificate.pem
  key: /var/lib/warpgate/tls.key.pem
log:
  retention: 7days
  send_to: ~

and once everything is configured, we run warpgate with warpgate run

warpgate-demo-target

apt-get update --fix-missing -y;
apt-get upgrade -y;
apt-get dist-upgrade -y;
apt-get install emacs tmux htop python3-pip;
pip3 install --upgrade pip;

Then all we need to do is add the SSH key from the host machine, and the warpgate public key from the relay machine to the target machine's /root/.ssh/authorized_keys file.

Confirmation

We should confirm that everything is working by executing: ssh target on the host machine, and then exiting and running ssh target.warpgate. Both should work. The former should be direct, while the latter should be through the warpgate relay machine.

Ansible Configuration

This should all be done on the host machine.

We will need 3 files - the configuration, inventory, and playbook. The playbook can be thought of as the script of things that we're going execute to configure the target, while the inventory is the place where we store information about our servers. In this case, only the target. All the files should be assumed to be in the current working directory.

To demonstrate the issue, we will need the following inventory (inventory.yaml):

all:
  hosts:
    target:
      ansible_user: root
      ansible_host: target
      ansible_python_interpreter: "/usr/bin/env python3"
    target.warpgate:
      ansible_ssh_user: admin:target
      ansible_host: target.warpgate
      ansible_python_interpreter: "/usr/bin/env python3"

The following configuration (ansible.cfg):

[defaults]
inventory = inventory.yaml
stdout_callback = yaml
bin_ansible_callbacks = True
remote_tmp = $HOME/.ansible/tmp

[connection]
ssh_args = -C -o ControlMaster=yes -o ControlPersist=yes
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r
pipelining = True

[privilege_escalation]
become_allow_same_user = True

and finally, the following playbook (playbook.yaml), which does 12 arbitrary actions (in our case, it creates a bunch of numbered files in the /root directory prefixed with the ansible_host variable which will indicate which of the connection methods succeeded in their tasks)

- hosts: all
  become: yes
  become_user: root
  tasks:
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.1"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.2"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.3"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.4"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.5"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.6"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.7"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.8"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.9"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.10"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.11"
    - file:
        state: touch
        path: "/root/{{ ansible_host }}.12"

Issue Demonstration

To demonstrate the issue, we simply run the following command on the host:

ansible-playbook playbook.yaml -i inventory.yaml

The output will look something like this:

root@warpgate-demo-host:~# ansible-playbook playbook.yaml -i inventory.yaml 

PLAY [all] ***

TASK [Gathering Facts] ***
ok: [target]
ok: [target.warpgate]

TASK [file] ***
changed: [target]
changed: [target.warpgate]

TASK [file] ***
changed: [target]
changed: [target.warpgate]

TASK [file] ***
changed: [target]
changed: [target.warpgate]

TASK [file] ***
changed: [target]
changed: [target.warpgate]

TASK [file] ***
changed: [target]
changed: [target.warpgate]

TASK [file] ***
changed: [target]
fatal: [target.warpgate]: FAILED! => 
  msg: 'Timeout (12s) waiting for privilege escalation prompt: '

TASK [file] ***
changed: [target]

TASK [file] ***
changed: [target]

TASK [file] ***
changed: [target]

TASK [file] ***
changed: [target]

TASK [file] ***
changed: [target]

TASK [file] ***
changed: [target]

PLAY RECAP ***
target                     : ok=13   changed=12   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
target.warpgate            : ok=6    changed=5    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

The result on the host will be that all 12 of the target files will be created, but only some of the target.warpgate files will be.

-rw-r--r--  1 root root    0 Aug 20 00:12 target.1
-rw-r--r--  1 root root    0 Aug 20 00:13 target.10
-rw-r--r--  1 root root    0 Aug 20 00:13 target.11
-rw-r--r--  1 root root    0 Aug 20 00:13 target.12
-rw-r--r--  1 root root    0 Aug 20 00:12 target.2
-rw-r--r--  1 root root    0 Aug 20 00:12 target.3
-rw-r--r--  1 root root    0 Aug 20 00:12 target.4
-rw-r--r--  1 root root    0 Aug 20 00:12 target.5
-rw-r--r--  1 root root    0 Aug 20 00:12 target.6
-rw-r--r--  1 root root    0 Aug 20 00:13 target.7
-rw-r--r--  1 root root    0 Aug 20 00:13 target.8
-rw-r--r--  1 root root    0 Aug 20 00:13 target.9
-rw-r--r--  1 root root    0 Aug 20 00:12 target.warpgate.1
-rw-r--r--  1 root root    0 Aug 20 00:12 target.warpgate.2
-rw-r--r--  1 root root    0 Aug 20 00:12 target.warpgate.3
-rw-r--r--  1 root root    0 Aug 20 00:12 target.warpgate.4
-rw-r--r--  1 root root    0 Aug 20 00:12 target.warpgate.5

Conversely, on the relay, this is what we see in the Warpgate logs while this is happening:

00:12:18  INFO SSH: Opening session channel channel=83926d58-082c-4f0a-9989-71c99bc2bfe7 session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:18  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-4 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-4" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:20  INFO Closed channel=83926d58-082c-4f0a-9989-71c99bc2bfe7 session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:20  INFO SSH: Opening session channel channel=644a307e-06b0-4c92-8445-3d74a0eda116 session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:20  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-5 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-5" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:20  INFO Closed channel=644a307e-06b0-4c92-8445-3d74a0eda116 session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:51  INFO SSH: Opening session channel channel=199e6d19-0617-4664-bea0-d8ad882fa46a session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:51  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-6 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-6" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:52  INFO Closed channel=199e6d19-0617-4664-bea0-d8ad882fa46a session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:53  INFO SSH: Opening session channel channel=7a3c10e4-2513-4853-a938-b37834429cc1 session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:53  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-7 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-7" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:53  INFO Closed channel=7a3c10e4-2513-4853-a938-b37834429cc1 session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:53  INFO SSH: Opening session channel channel=8d831669-9437-49ae-bb2c-7cd22798b423 session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:53  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-8 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-8" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:53  INFO Closed channel=8d831669-9437-49ae-bb2c-7cd22798b423 session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:53  INFO SSH: Opening session channel channel=314aac87-f658-4b93-8330-0c7c10984854 session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:53  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-9 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-9" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:54  INFO Closed channel=314aac87-f658-4b93-8330-0c7c10984854 session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:54  INFO SSH: Opening session channel channel=6d4595ca-d317-4f1f-b443-edf1e8c98809 session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:54  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-10 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-10" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:54  INFO Closed channel=6d4595ca-d317-4f1f-b443-edf1e8c98809 session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:54  INFO SSH: Opening session channel channel=be788718-9347-4882-a7ef-4f65af991b51 session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:54  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-11 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-11" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:55  INFO Closed channel=be788718-9347-4882-a7ef-4f65af991b51 session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:55  INFO SSH: Opening session channel channel=d0447aab-2095-4295-b4b9-7050c1615381 session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:12:55 ERROR SSH: error in command loop error=failed to open shell

Caused by:
    Disconnected session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:55  INFO SSH: Closed connection session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:55  INFO SSH: Disconnect session=c32f889d-8de5-4a06-b339-c49fc97587a5
00:12:55  INFO SSH: Recording session c32f889d-8de5-4a06-b339-c49fc97587a5 name=exec-channel-12 path="/var/lib/warpgate/recordings/c32f889d-8de5-4a06-b339-c49fc97587a5/exec-channel-12" session=c32f889d-8de5-4a06-b339-c49fc97587a5 session_username=admin
00:18:07  INFO Closed connection

This error happens regardless of the task being performed, files, apt-get installations, whatever, it doesn't matter. One of the tasks eventually times out and I have no clue why. I tried setting a higher timeout (it should take less than 12 seconds to create a file) but it still times out after two minutes anyway. I tried commenting out specific tasks, but it happens on the Nth task not matter what. I will also note that ansible is supposed to be idempotent, so if you re-run the playbook, it will skip some of the tasks because they are already complete. This is fine, but even then one of the tasks running through warpgate will time out. For example, after running this a second time, 9 files were delivered through the warpgate connection before the timeout issue occurred.

Please advise.

Abraxos avatar Aug 20 '22 00:08 Abraxos

Are you getting sshd: no more sessions in syslog on the target machines? In this case you can work around this by setting a very high MaxSessions value in sshd_config.

It looks like Warpgate isn't passing the "no more sessions" error back to Ansible correctly, so it's waiting forever.

Eugeny avatar Aug 20 '22 08:08 Eugeny

I set MaxSessions in sshd_config to 1000 and it completed (I had previously set it to 20 and it got like 15 steps in and the same thing happened).

Out of curiosity, why do we not hit this limit when we are using the direct SSH?

Abraxos avatar Aug 20 '22 16:08 Abraxos

Ansible transparently handles connection errors and will just retry the operation, but Warpgate isn't passing this error back correctly, so Ansible doesn't see it and is still waiting for the response from the server :(

Eugeny avatar Aug 20 '22 16:08 Eugeny

Oh cool, so presumably I can get this working now with some absurd amount of connections and then the issue would be fixed in some future version?

Abraxos avatar Aug 20 '22 19:08 Abraxos

Correct! I've fixed error transmission, but Ansible still hangs up every now an then when pipelining is enabled - I'm still investigating.

Eugeny avatar Aug 20 '22 19:08 Eugeny

Sweet. I appreciate it, i actually enabled pipelining partly as a way of trying to get around this bug, so I can live without it, but it'd be nice to be able to use that as well

Abraxos avatar Aug 20 '22 19:08 Abraxos

An update comment aimed at @Abraxos to bring #459 to attention, with question:

Do SSH Connections through warpgate still hang when using Ansible?

stappersg avatar Nov 12 '22 13:11 stappersg

An update comment aimed at @Abraxos to bring #459 to attention, with question:

Do SSH Connections through warpgate still hang when using Ansible?

Yes, but I have not had a chance to test the code that is supposed to fix this. Due to my configuration management setup, I only use the pre-built releases of Warpgate, so I'm still running 0.6.4 and waiting for a new version before I pick up the code meant to fix it.

Abraxos avatar Nov 14 '22 01:11 Abraxos

Confirmed that this issue is fixed for me as of 0.7.0

Abraxos avatar Nov 22 '22 01:11 Abraxos