Spikes in connection time
We are seeing intermittent spikes in the time it takes for a user to establish a connection with our SSH Server. The delay is in between PublicKeyHandler and Handler starting. So the customer successfully authenticates, then something happens anywhere from a couple seconds to multiple minutes (9 minutes was the most this weeks), then then the Handler kicks in and they are connected as usual. it seems to be inside the gliderlabs/ssh code while this is happening.
We have not been able to reproduce the issue ourselves. It does not happen to any particular customers or with any particular client, but we see some customer connections being delayed like this every day.
We only have questions right now:
- Is it possible that there are some timeout values that we could configure around this?
- Is it possible there is a prompt or some other customer interaction that it is waiting on?
I don't believe there are timeouts, but adding them would requiring knowing what needs to be timed out. How are you "seeing" their connections delayed? Logs? Have you tried writing a test case that tries it a bunch to see if it happens?
On Mon, Sep 30, 2019 at 4:29 PM Chris DuVall [email protected] wrote:
We are seeing intermittent spikes in the time it takes for a user to establish a connection with our SSH Server. The delay is in between PublicKeyHandler and Handler starting. So the customer successfully authenticates, then something happens anywhere from a couple seconds to multiple minutes (9 minutes was the most this weeks), then then the Handler kicks in and they are connected as usual. it seems to be inside the gliderlabs/ssh code while this is happening.
We have not been able to reproduce the issue ourselves. It does not happen to any particular customers or with any particular client, but we see some customer connections being delayed like this every day.
We only have questions right now:
- Is it possible that there are some timeout values that we could configure around this?
- Is it possible there is a prompt or some other customer interaction that it is waiting on?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gliderlabs/ssh/issues/121?email_source=notifications&email_token=AAAAFB3VV6T6WJOKJH26P6LQMJVSRA5CNFSM4I4A54R2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HOVSRTQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAAFBZXORTUPITJLRJ3J7LQMJVSRANCNFSM4I4A54RQ .
-- Jeff Lindsay http://progrium.com
We are seeing these in our logs and monitoring. We log the time it takes between multiple different steps. We have one specifically for the time in between our authentication completing and the handler starting. We have a load test that we run which does not ever hit this issue.
When I was maintaining the server for Bitbucket, we had a number of users who would open a connection then use it later. We added a timeout to kill connections which took too long to open a session.
Would one of the timeouts mentioned in https://github.com/gliderlabs/ssh/issues/11 work? It seems like this issue might be a duplicate of that.