Hello All, We have a Linux cluster application that uses openssh as its inter-node communication mechanism and we've recently run into a problem that points to a potential scalability issue in openssh code. Our client nodes systematically open ssh connections to the server node to execute an administrative command. When establishing socket connections, the server side sometimes fails to complete the TCP handshake with some of the clients. The final ACK coming from the client node would sometimes be dropped by server-side TCP, and the corresponding connection would never be added to sshd's accept queue. This leaves the ssh client command in a hung state, as it has completed its part of the TCP handshake and is ready to exchange data over the socket. This problem reveals itself in situations where 64 or more client nodes issue concurrent ssh requests to the server. Looking at sshd.c, I noticed that the daemon's listen socket is created with a very short backlog value (5), and we are certain that this is the cause of our problem. Is there a reason for using such a small value, as opposed to setting the backlog to SOMAXCONN? We need to scale our application to clusters with thousands of nodes and we are trying to determine whether openssh would permit us to achieve these scaling requirements. If the increase of sshd's backlog has no negative implications, we would like to see this value increased to SOMAXCONN. I think that such change would make openssh a more reliable tool for clustered environments. Any help or feedback from you would be appreciated. Thanks, - Andrey Ermolinskiy
On Sat, 2003-11-22 at 04:20, Andrey Ermolinskiy wrote:> Hello All, > > We have a Linux cluster application that uses openssh as its inter-node > communication mechanism and we've recently run into a problem that points > to a potential scalability issue in openssh code. > > Our client nodes systematically open ssh connections to the server node to > execute an administrative command. When establishing socket connections, > the server side sometimes fails to complete the TCP handshake with some of > the clients. The final ACK coming from the client node would sometimes be > dropped by server-side TCP, and the corresponding connection would never be > added to sshd's accept queue. This leaves the ssh client command in a hung > state, as it has completed its part of the TCP handshake and is ready to > exchange data over the socket.This sounds like a TCP problem, not a ssh problem. If the ACK is dropped by the server end, then the client should just resend?> This problem reveals itself in situations where 64 or more client nodes > issue concurrent ssh requests to the server. > > Looking at sshd.c, I noticed that the daemon's listen socket is created > with a very short backlog value (5), and we are certain that this is the > cause of our problem. Is there a reason for using such a small value, as > opposed to setting the backlog to SOMAXCONN?I'm not sure why the backlog is set low, perhaps to offer some mitigation for connection flooding DoS attacks. Markus? -d
> This sounds like a TCP problem, not a ssh problem. If the ACK is dropped > by the server end, then the client should just resend?The behavior that we observed is that the client end does not automatically resend the ACK. Instead, the server end goes into a backoff-retry mode, in which it waits for some time, and then resends the SYNACK to the client. This causes the client to resubmit its ACK, after which the server makes anothr attempt to place the connection on the accept queue. The waiting timeout is intially 1 second and is doubled after each retry. If, after 5 retries (as defined by tcp_synack_retries tunable), the queue is still full, the server will quietly give up and will leave the connection in SYN_RCVD state forever. This hangs the client command because the client end has already entered the ESTABLISHED state, and is sitting in a socket call, presumably waiting for data. One could argue that it's a problem in the implementation of Linux TCP, and that the server end should terminate such connections with the RESET flag. Perhaps this condition hasn't been noticed because most socket application nowadays use a much larger value for backlog (typically, SOMAXCONN). The bottom line is that if it is safe to increase the backlog in openssh, doing this would probably prevent such conditions from occurring. Regards, Andrey