Corey Hickey
2022-Sep-16 23:45 UTC
TCP Forwarding hangs when TCP service is unresponsive, even when TCP client exits
When a TCP client does not receive a response from a service, the client can opt to time out and exit. If the connection is passed through an SSH tunnel, however, certain circumstances can make the SSH tunnel hang indefinitely. This affects both remote port forwarding (-R) and local (-L). This report is for current openssh-portable git running on Linux. For released versions, at least OpenSSH 8.9p1 is affected as well, though I did not test other versions. --- Remote Forwarding --- To reproduce, first start up a TCP service which will accept connections but fail to respond thereafter. One way to do this is by stopping netcat shortly after startup (options are for the OpenBSD version of netcat). $ nc -k -l 127.0.0.1 9999 > /dev/null & sleep 1 ; kill -STOP %% Next use a different terminal to start an SSH service. Debug options are not required but are helpful for diagnosis. This example uses port 2222 in order to avoid conflict with the system SSH service. The service can be run on the same local host or on a remote host. $ sudo /usr/sbin/sshd -d -d -f /dev/null -o Port=2222 -o \ HostKey=/etc/ssh/ssh_host_ed25519_key Next use a different terminal to start an SSH client which runs a TCP client over a forwarded connection. This example uses wget, but any TCP client that can be configured to time out should behave the same. $ ssh localhost -p 2222 -R 8888:127.0.0.1:9999 -v -v wget \ --timeout=1 --tries=1 http://127.0.0.1:8888 The observed results are that the TCP client (wget) exits, but the SSH client hangs until either manually killed or the TCP service (netcat) is resumed. In detail, the sequence of events is as follows: 1. The SSH client connects to the server; the client and server set up channels as usual, including one for the port-forwarding. The SSH client starts a TCP client on the SSH server. 2. The TCP client connects to the SSH server's listening socket, and the SSH client connects to the TCP service's listening socket. The 3-way handshakes complete, but when the TCP client sends data to the service, the service never responds. 3. The TCP client times out, closes its socket for the connection to the SSH server, and exits. The SSH server sends the client an EOF on the forwarded channel, but does not close its own socket for the connection to the now-exited TCP client; this socket remains in CLOSE_WAIT. 4. The SSH client receives the EOF and drains the channel output, but continues to wait for data on the channel input. The SSH server won't close the channel until the client does, and the client won't close the channel until it receives data (or an error) from the channel. --- Local Forwarding --- The situation for local forwarding is similar, but requires different steps to reproduce. First start a TCP service and an SSH service as described above. Then use a new terminal to start an SSH client: $ ssh localhost -p 2222 -L 8888:127.0.0.1:9999 -v -v Use a new terminal to run a TCP client over the forwarded connection. $ wget --timeout=1 --tries=1 http://127.0.0.1:8888 Lastly, exit from the SSH client's interactive shell. The observed results are then the same hang as for remote forwarding. I will send a patch shortly that fixes the issue for me, though I do not know if my fix is correct. Thanks, Corey
Corey Hickey
2022-Sep-16 23:55 UTC
[PATCH] client/server: fix TCP forwarding hang when a service is unresponsive
From: Corey Hickey <chickey at tagged.com> When a remote-forwarded TCP service is unresponsive, the SSH client can hang after the SSH server sends an EOF. The SSH client receives the EOF and drains the channel output, but continues to wait for data on the channel input. The SSH server won't close the channel until the client does, and the client won't close the channel until it receives data (or an error) from the channel. The client can end up waiting forever if the forwarded TCP service never responds. The analagous situation happens for local-forwarded TCP services, with the roles of the SSH server and client reversed. My attempted fix for each case is to configure the channel to force an input drain upon receiving an EOF. I do not know if this is the right approach, but the test suite continues to pass for me, at least. --- clientloop.c | 1 + serverloop.c | 1 + 2 files changed, 2 insertions(+) diff --git a/clientloop.c b/clientloop.c index 0050f3eb..cd482e5d 100644 --- a/clientloop.c +++ b/clientloop.c @@ -1514,6 +1514,7 @@ client_request_forwarded_tcpip(struct ssh *ssh, const char *request_type, sshbuf_free(b); free(originator_address); free(listen_address); + c->force_drain = 1; return c; } diff --git a/serverloop.c b/serverloop.c index b4c0d82b..486e5277 100644 --- a/serverloop.c +++ b/serverloop.c @@ -470,6 +470,7 @@ server_request_direct_tcpip(struct ssh *ssh, int *reason, const char **errmsg) out: free(originator); free(target); + c->force_drain = 1; return c; } -- 2.35.1
Damien Miller
2022-Sep-19 08:55 UTC
TCP Forwarding hangs when TCP service is unresponsive, even when TCP client exits
On Fri, 16 Sep 2022, Corey Hickey wrote: First, thanks for the detailed investigation and for reproducing this with git HEAD.> When a TCP client does not receive a response from a service, the client > can opt to time out and exit. If the connection is passed through an SSH > tunnel, however, certain circumstances can make the SSH tunnel hang > indefinitely. This affects both remote port forwarding (-R) and local > (-L). > > This report is for current openssh-portable git running on Linux. For > released versions, at least OpenSSH 8.9p1 is affected as well, though I > did not test other versions. > > --- Remote Forwarding --- > To reproduce, first start up a TCP service which will accept connections > but fail to respond thereafter. One way to do this is by stopping netcat > shortly after startup (options are for the OpenBSD version of netcat). > > $ nc -k -l 127.0.0.1 9999 > /dev/null & sleep 1 ; kill -STOP %% > > Next use a different terminal to start an SSH service. Debug options are > not required but are helpful for diagnosis. This example uses port 2222 > in order to avoid conflict with the system SSH service. The service can > be run on the same local host or on a remote host. > > $ sudo /usr/sbin/sshd -d -d -f /dev/null -o Port=2222 -o \ > HostKey=/etc/ssh/ssh_host_ed25519_key > > Next use a different terminal to start an SSH client which runs a TCP > client over a forwarded connection. This example uses wget, but any TCP > client that can be configured to time out should behave the same. > > $ ssh localhost -p 2222 -R 8888:127.0.0.1:9999 -v -v wget \ > --timeout=1 --tries=1 http://127.0.0.1:8888 > > > The observed results are that the TCP client (wget) exits, but the SSH > client hangs until either manually killed or the TCP service (netcat) is > resumed. > > In detail, the sequence of events is as follows: > 1. The SSH client connects to the server; the client and server set up > channels as usual, including one for the port-forwarding. The SSH client > starts a TCP client on the SSH server. > 2. The TCP client connects to the SSH server's listening socket, and the > SSH client connects to the TCP service's listening socket. The 3-way > handshakes complete, but when the TCP client sends data to the service, > the service never responds. > 3. The TCP client times out, closes its socket for the connection to the > SSH server, and exits. The SSH server sends the client an EOF on the > forwarded channel, but does not close its own socket for the connection > to the now-exited TCP client; this socket remains in CLOSE_WAIT. > 4. The SSH client receives the EOF and drains the channel output, but > continues to wait for data on the channel input. The SSH server won't > close the channel until the client does, and the client won't close the > channel until it receives data (or an error) from the channel.This is kind of a tricky case, because for some cases it's AFAIK impossible for the client to discern between a TCP server that a) will never respond from b) hasn't responded *yet*. The solution that you proposed is unfortunately not without side effects - I think it changes the behaviour of half-closed TCP connection in a way that might lose data.> [djm at lll ~]$ wget --timeout=1 --tries=1 http://127.0.0.1:8888--2022-09-19 18:16:17-- http://127.0.0.1:8888/ > Connecting to 127.0.0.1:8888... debug3: receive packet: type 90 > debug1: client_input_channel_open: ctype forwarded-tcpip rchan 3 win 2097152 max 32768 > debug1: client_request_forwarded_tcpip: listen localhost port 8888, originator 127.0.0.1 port 35148 > debug2: fd 7 setting O_NONBLOCK > debug2: fd 7 setting TCP_NODELAY > debug1: connect_next: host 127.0.0.1 ([127.0.0.1]:9999) in progress, fd=7 > debug3: fd 7 is O_NONBLOCK > debug3: fd 7 is O_NONBLOCK > debug1: channel 1: new [127.0.0.1] > debug1: confirm forwarded-tcpip > debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connection > connected. > HTTP request sent, awaiting response... debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connection > Read error (Connection timed out) in headers. > Giving up. > > debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connectionIs this what you see too? IMO the root problem here is that channels in state SSH_CHANNEL_CONNECTING have no timeout unless there system's TCP stack implements one. Maybe OpenSSH should implement something conservative here. I do notice some different behaviour between Linux (above) and OpenBSD. On OpenBSD the connection is accepted but obviously does not pass any data (of course). This is harder to fix without the side effects I mentioned above, e.g. consider a TCP client program that connects to a forwarded socket, sends a message and exits without waiting for a reply. I think setting c->force_drain in this case could cause the message to be lost (though I'm not 100% sure). -d