Jiaying Zhang
2007-Jul-24 22:33 UTC
ssh client does not timeout if the network fails after ssh_connect but before ssh_exchange_identification, even with Alive options set
Hello, I am testing ssh with occasional network disconnection between server and client during these days. I found ssh sometimes hangs if the disconnection happens after the connection is established but before ssh_exchange_identification completes. The ssh configuration files show that both client and server alive options are set. In /etc/ssh/ssh_config: # Send keepalive messages to the server. Disconnect after 90 seconds. ServerAliveInterval 30 ServerAliveCountMax 3 In /etc/ssh/sshd_config: # ClientAlive is more flexible and secure than TCPKeepAlive. (ssh2) # Send an alive messages every 30 seconds, and disconnect after 90 seconds. ClientAliveInterval 30 ClientAliveCountMax 3 The ssh client kept hanging even after the network was resumed. It finally timed out after about 2 hours because the tcp_keepalive_time is set as 2 hours in sysctl. I looked at the ssh code downloaded from your website and found the Alive options are only used to setup timeout after ssh_session starts. So my question is why we do not start monitoring the liveness of ssh server right after a connection is established. It is annoying when an application relies on ssh to do periodic work but an occasional network failure causes the application to miss several service circles due to ssh hanging. Thanks a lot! Jiaying
Jiaying Zhang
2007-Jul-25 22:12 UTC
ssh client does not timeout if the network fails after ssh_connect but before ssh_exchange_identification, even with Alive options set
Hello again, Here is the patch I came up with to prevent the hanging in ssh_exchange_identification. I tested it a little bit and it seems to have solved the problem. Could anyone help to have a look at the patch? Thanks a lot! --- sshconnect.c~old 2007-07-25 10:44:26.000000000 -0700 +++ sshconnect.c 2007-07-25 14:45:57.000000000 -0700 @@ -404,9 +404,26 @@ ssh_exchange_identification(void) int minor1 = PROTOCOL_MINOR_1; u_int i, n; + if (options.server_alive_interval) { + fd_set rfds; + struct timeval timeo = { .tv_usec=0 }; + int read_timeouts, ret; + + FD_SET(connection_in, &rfds); + for (read_timeouts = 0;;) { + timeo.tv_sec = options.server_alive_interval; + ret = select(connection_in+1, &rfds, NULL, NULL, &timeo); + if (ret < 0) { + fatal("ssh_exchange_identification: select read error: %.100s", strerror(errno)); + } else if (ret == 0) { + if (++read_timeouts >options.server_alive_count_max) + fatal("ssh_exchange_identification: Timeout, server not responding"); + } else + break; + } + + } /* Read other side's version identification. */ - struct timeval timeo = { .tv_sec=10, .tv_usec=0 }; - setsockopt(connection_in, SOL_SOCKET, SO_SNDTIMEO, &timeo, sizeof(timeo)); for (n = 0;;) { for (i = 0; i < sizeof(buf) - 1; i++) { size_t len = atomicio(read, connection_in, &buf[i], 1); @@ -490,6 +507,25 @@ ssh_exchange_identification(void) compat20 ? PROTOCOL_MAJOR_2 : PROTOCOL_MAJOR_1, compat20 ? PROTOCOL_MINOR_2 : minor1, SSH_VERSION); + if (options.server_alive_interval) { + fd_set wfds; + struct timeval timeo = { .tv_usec=0 }; + int write_timeouts, ret; + + FD_SET(connection_out, &wfds); + for (write_timeouts = 0;;) { + timeo.tv_sec = options.server_alive_interval; + ret = select(connection_out+1, NULL, &wfds, NULL, &timeo); + if (ret < 0) { + fatal("ssh_exchange_identification: select write error: %.100s", strerror(errno)); + } else if (ret == 0) { + if (++write_timeouts >options.server_alive_count_max) + fatal("ssh_exchange_identification: Timeout, server not responding"); + } else + break; + } + + } if (atomicio(vwrite, connection_out, buf, strlen(buf)) !strlen(buf)) fatal("write: %.100s", strerror(errno)); client_version_string = xstrdup(buf); Jiaying On 7/24/07, Jiaying Zhang <jiayingz at google.com> wrote:> > Hello, > > I am testing ssh with occasional network disconnection between server and > client during these days. I found ssh sometimes hangs if the disconnection > happens after the connection is established but before > ssh_exchange_identification completes. The ssh configuration files show that > both client and server alive options are set. > In /etc/ssh/ssh_config: > # Send keepalive messages to the server. Disconnect after 90 seconds. > ServerAliveInterval 30 > ServerAliveCountMax 3 > In /etc/ssh/sshd_config: > # ClientAlive is more flexible and secure than TCPKeepAlive. (ssh2) > # Send an alive messages every 30 seconds, and disconnect after 90 > seconds. > ClientAliveInterval 30 > ClientAliveCountMax 3 > > The ssh client kept hanging even after the network was resumed. It finally > timed out after about 2 hours because the tcp_keepalive_time is set as 2 > hours in sysctl. > I looked at the ssh code downloaded from your website and found the Alive > options are only used to setup timeout after ssh_session starts. So my > question is why we do not start monitoring the liveness of ssh server right > after a connection is established. It is annoying when an application relies > on ssh to do periodic work but an occasional network failure causes the > application to miss several service circles due to ssh hanging. > > Thanks a lot! > > Jiaying > >