Damien Miller
2022-Sep-19 08:55 UTC
TCP Forwarding hangs when TCP service is unresponsive, even when TCP client exits
On Fri, 16 Sep 2022, Corey Hickey wrote: First, thanks for the detailed investigation and for reproducing this with git HEAD.> When a TCP client does not receive a response from a service, the client > can opt to time out and exit. If the connection is passed through an SSH > tunnel, however, certain circumstances can make the SSH tunnel hang > indefinitely. This affects both remote port forwarding (-R) and local > (-L). > > This report is for current openssh-portable git running on Linux. For > released versions, at least OpenSSH 8.9p1 is affected as well, though I > did not test other versions. > > --- Remote Forwarding --- > To reproduce, first start up a TCP service which will accept connections > but fail to respond thereafter. One way to do this is by stopping netcat > shortly after startup (options are for the OpenBSD version of netcat). > > $ nc -k -l 127.0.0.1 9999 > /dev/null & sleep 1 ; kill -STOP %% > > Next use a different terminal to start an SSH service. Debug options are > not required but are helpful for diagnosis. This example uses port 2222 > in order to avoid conflict with the system SSH service. The service can > be run on the same local host or on a remote host. > > $ sudo /usr/sbin/sshd -d -d -f /dev/null -o Port=2222 -o \ > HostKey=/etc/ssh/ssh_host_ed25519_key > > Next use a different terminal to start an SSH client which runs a TCP > client over a forwarded connection. This example uses wget, but any TCP > client that can be configured to time out should behave the same. > > $ ssh localhost -p 2222 -R 8888:127.0.0.1:9999 -v -v wget \ > --timeout=1 --tries=1 http://127.0.0.1:8888 > > > The observed results are that the TCP client (wget) exits, but the SSH > client hangs until either manually killed or the TCP service (netcat) is > resumed. > > In detail, the sequence of events is as follows: > 1. The SSH client connects to the server; the client and server set up > channels as usual, including one for the port-forwarding. The SSH client > starts a TCP client on the SSH server. > 2. The TCP client connects to the SSH server's listening socket, and the > SSH client connects to the TCP service's listening socket. The 3-way > handshakes complete, but when the TCP client sends data to the service, > the service never responds. > 3. The TCP client times out, closes its socket for the connection to the > SSH server, and exits. The SSH server sends the client an EOF on the > forwarded channel, but does not close its own socket for the connection > to the now-exited TCP client; this socket remains in CLOSE_WAIT. > 4. The SSH client receives the EOF and drains the channel output, but > continues to wait for data on the channel input. The SSH server won't > close the channel until the client does, and the client won't close the > channel until it receives data (or an error) from the channel.This is kind of a tricky case, because for some cases it's AFAIK impossible for the client to discern between a TCP server that a) will never respond from b) hasn't responded *yet*. The solution that you proposed is unfortunately not without side effects - I think it changes the behaviour of half-closed TCP connection in a way that might lose data.> [djm at lll ~]$ wget --timeout=1 --tries=1 http://127.0.0.1:8888--2022-09-19 18:16:17-- http://127.0.0.1:8888/ > Connecting to 127.0.0.1:8888... debug3: receive packet: type 90 > debug1: client_input_channel_open: ctype forwarded-tcpip rchan 3 win 2097152 max 32768 > debug1: client_request_forwarded_tcpip: listen localhost port 8888, originator 127.0.0.1 port 35148 > debug2: fd 7 setting O_NONBLOCK > debug2: fd 7 setting TCP_NODELAY > debug1: connect_next: host 127.0.0.1 ([127.0.0.1]:9999) in progress, fd=7 > debug3: fd 7 is O_NONBLOCK > debug3: fd 7 is O_NONBLOCK > debug1: channel 1: new [127.0.0.1] > debug1: confirm forwarded-tcpip > debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connection > connected. > HTTP request sent, awaiting response... debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connection > Read error (Connection timed out) in headers. > Giving up. > > debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connectionIs this what you see too? IMO the root problem here is that channels in state SSH_CHANNEL_CONNECTING have no timeout unless there system's TCP stack implements one. Maybe OpenSSH should implement something conservative here. I do notice some different behaviour between Linux (above) and OpenBSD. On OpenBSD the connection is accepted but obviously does not pass any data (of course). This is harder to fix without the side effects I mentioned above, e.g. consider a TCP client program that connects to a forwarded socket, sends a message and exits without waiting for a reply. I think setting c->force_drain in this case could cause the message to be lost (though I'm not 100% sure). -d
Robinson, Herbie
2022-Sep-19 11:06 UTC
[EXTERNAL] Re: TCP Forwarding hangs when TCP service is unresponsive, even when TCP client exits
How does this behave when you enable SSH keepalive?>From Phone________________________________ From: openssh-unix-dev <openssh-unix-dev-bounces+herbie.robinson=stratus.com at mindrot.org> on behalf of Damien Miller <djm at mindrot.org> Sent: Monday, September 19, 2022 4:55:21 AM To: Corey Hickey <bugfood-ml at fatooh.org> Cc: openssh-unix-dev at mindrot.org <openssh-unix-dev at mindrot.org> Subject: [EXTERNAL] Re: TCP Forwarding hangs when TCP service is unresponsive, even when TCP client exits [EXTERNAL SENDER: This email originated from outside of Stratus Technologies. Do not click links or open attachments unless you recognize the sender and know the content is safe.] On Fri, 16 Sep 2022, Corey Hickey wrote: First, thanks for the detailed investigation and for reproducing this with git HEAD.> When a TCP client does not receive a response from a service, the client > can opt to time out and exit. If the connection is passed through an SSH > tunnel, however, certain circumstances can make the SSH tunnel hang > indefinitely. This affects both remote port forwarding (-R) and local > (-L). > > This report is for current openssh-portable git running on Linux. For > released versions, at least OpenSSH 8.9p1 is affected as well, though I > did not test other versions. > > --- Remote Forwarding --- > To reproduce, first start up a TCP service which will accept connections > but fail to respond thereafter. One way to do this is by stopping netcat > shortly after startup (options are for the OpenBSD version of netcat). > > $ nc -k -l 127.0.0.1<http://127.0.0.1> 9999 > /dev/null & sleep 1 ; kill -STOP %% > > Next use a different terminal to start an SSH service. Debug options are > not required but are helpful for diagnosis. This example uses port 2222 > in order to avoid conflict with the system SSH service. The service can > be run on the same local host or on a remote host. > > $ sudo /usr/sbin/sshd -d -d -f /dev/null -o Port=2222 -o \ > HostKey=/etc/ssh/ssh_host_ed25519_key > > Next use a different terminal to start an SSH client which runs a TCP > client over a forwarded connection. This example uses wget, but any TCP > client that can be configured to time out should behave the same. > > $ ssh localhost -p 2222 -R 8888:127.0.0.1:9999<http://127.0.0.1:9999> -v -v wget \ > --timeout=1 --tries=1 http://127.0.0.1:8888<http://127.0.0.1:8888> > > > The observed results are that the TCP client (wget) exits, but the SSH > client hangs until either manually killed or the TCP service (netcat) is > resumed. > > In detail, the sequence of events is as follows: > 1. The SSH client connects to the server; the client and server set up > channels as usual, including one for the port-forwarding. The SSH client > starts a TCP client on the SSH server. > 2. The TCP client connects to the SSH server's listening socket, and the > SSH client connects to the TCP service's listening socket. The 3-way > handshakes complete, but when the TCP client sends data to the service, > the service never responds. > 3. The TCP client times out, closes its socket for the connection to the > SSH server, and exits. The SSH server sends the client an EOF on the > forwarded channel, but does not close its own socket for the connection > to the now-exited TCP client; this socket remains in CLOSE_WAIT. > 4. The SSH client receives the EOF and drains the channel output, but > continues to wait for data on the channel input. The SSH server won't > close the channel until the client does, and the client won't close the > channel until it receives data (or an error) from the channel.This is kind of a tricky case, because for some cases it's AFAIK impossible for the client to discern between a TCP server that a) will never respond from b) hasn't responded *yet*. The solution that you proposed is unfortunately not without side effects - I think it changes the behaviour of half-closed TCP connection in a way that might lose data.> [djm at lll ~]$ wget --timeout=1 --tries=1 http://127.0.0.1:8888--2022-09-19 18:16:17-- http://127.0.0.1:8888/<http://127.0.0.1:8888/> > Connecting to 127.0.0.1:8888<http://127.0.0.1:8888>... debug3: receive packet: type 90 > debug1: client_input_channel_open: ctype forwarded-tcpip rchan 3 win 2097152 max 32768 > debug1: client_request_forwarded_tcpip: listen localhost port 8888, originator 127.0.0.1<http://127.0.0.1> port 35148 > debug2: fd 7 setting O_NONBLOCK > debug2: fd 7 setting TCP_NODELAY > debug1: connect_next: host 127.0.0.1<http://127.0.0.1> ([127.0.0.1<http://127.0.0.1>]:9999) in progress, fd=7 > debug3: fd 7 is O_NONBLOCK > debug3: fd 7 is O_NONBLOCK > debug1: channel 1: new [127.0.0.1<http://127.0.0.1>] > debug1: confirm forwarded-tcpip > debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connection > connected. > HTTP request sent, awaiting response... debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connection > Read error (Connection timed out) in headers. > Giving up. > > debug3: channel 1: waiting for connection > debug3: channel 1: waiting for connectionIs this what you see too? IMO the root problem here is that channels in state SSH_CHANNEL_CONNECTING have no timeout unless there system's TCP stack implements one. Maybe OpenSSH should implement something conservative here. I do notice some different behaviour between Linux (above) and OpenBSD. On OpenBSD the connection is accepted but obviously does not pass any data (of course). This is harder to fix without the side effects I mentioned above, e.g. consider a TCP client program that connects to a forwarded socket, sends a message and exits without waiting for a reply. I think setting c->force_drain in this case could cause the message to be lost (though I'm not 100% sure). -d _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev at mindrot.org https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev<https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev>
Corey Hickey
2022-Sep-19 18:23 UTC
TCP Forwarding hangs when TCP service is unresponsive, even when TCP client exits
On 2022-09-19 01:55, Damien Miller wrote:> On Fri, 16 Sep 2022, Corey Hickey wrote: > > First, thanks for the detailed investigation and for reproducing this > with git HEAD.You're welcome, and thank you for your attention to this.> This is kind of a tricky case, because for some cases it's AFAIK impossible > for the client to discern between a TCP server that a) will never respond > from b) hasn't responded *yet*. > > The solution that you proposed is unfortunately not without side > effects - I think it changes the behaviour of half-closed TCP connection > in a way that might lose data.I was afraid there would be something like this.>> [djm at lll ~]$ wget --timeout=1 --tries=1 http://127.0.0.1:8888--2022-09-19 18:16:17-- http://127.0.0.1:8888/ >> Connecting to 127.0.0.1:8888... debug3: receive packet: type 90 >> debug1: client_input_channel_open: ctype forwarded-tcpip rchan 3 win 2097152 max 32768 >> debug1: client_request_forwarded_tcpip: listen localhost port 8888, originator 127.0.0.1 port 35148 >> debug2: fd 7 setting O_NONBLOCK >> debug2: fd 7 setting TCP_NODELAY >> debug1: connect_next: host 127.0.0.1 ([127.0.0.1]:9999) in progress, fd=7 >> debug3: fd 7 is O_NONBLOCK >> debug3: fd 7 is O_NONBLOCK >> debug1: channel 1: new [127.0.0.1] >> debug1: confirm forwarded-tcpip >> debug3: channel 1: waiting for connection >> debug3: channel 1: waiting for connection >> connected. >> HTTP request sent, awaiting response... debug3: channel 1: waiting for connection >> debug3: channel 1: waiting for connection >> Read error (Connection timed out) in headers. >> Giving up. >> >> debug3: channel 1: waiting for connection >> debug3: channel 1: waiting for connection > > Is this what you see too?From the context, this appears to be for the remote port forwarding case to localhost with an interactive SSH session. I think your example is a bit different from mine, though--in your case, nothing is responding to the SYNs sent to port 9999. Is the above from OpenBSD? If so, then there might be a difference in OS behavior. On Linux, at least, a 3-way handshake appears to be able to complete even if the service listening on the socket is stopped. I can reproduce your output if I configure my localhost firewall to drop packets sent to 9999. In that case, I get output like yours, but the SSH session does not hang upon logout. The problem I'm seeing is a bit different: the TCP connection to the service completes, so the SSH forwarded channel becomes fully established. Any data sent over the channel, however, goes to a service that is stopped. When the TCP client times out, the SSH server sends the SSH client an EOF, but the SSH client never does it's EOF+CLOSE, since it is waiting for an unresponsive TCP service to respond. I can provide debug3 output for the remote forwarding scenario using an interactive shell. If I run: $ ssh localhost -p 2222 -R 8888:127.0.0.1:9999 -v -v -v ...and then run wget from _within_ that interactive shell, I get: ------------------------------------------------------------------ $ wget --timeout=1 --tries=1 http://127.0.0.1:8888 --2022-09-19 10:00:45-- http://127.0.0.1:8888/ Connecting to 127.0.0.1:8888... debug3: receive packet: type 90 debug1: client_input_channel_open: ctype forwarded-tcpip rchan 3 win 2097152 max 32768 debug1: client_request_forwarded_tcpip: listen localhost port 8888, originator 127.0.0.1 port 54490 debug1: connect_next: start for host 127.0.0.1 ([127.0.0.1]:9999) debug2: fd 7 setting O_NONBLOCK debug2: fd 7 setting TCP_NODELAY debug1: connect_next: connect host 127.0.0.1 ([127.0.0.1]:9999) in progress, fd=7 debug3: fd 7 is O_NONBLOCK debug3: fd 7 is O_NONBLOCK debug1: channel 1: new [127.0.0.1] debug1: confirm forwarded-tcpip debug3: channel 1: waiting for connection connected. HTTP request sent, awaiting response... debug1: channel 1: connected to 127.0.0.1 port 9999 debug3: send packet: type 91 debug3: receive packet: type 96 debug2: channel 1: rcvd eof debug2: channel 1: output open -> drain debug2: channel 1: obuf empty debug2: chan_shutdown_write: channel 1: (i0 o1 sock 7 wfd 7 efd -1 [closed]) debug2: channel 1: output drain -> closed Read error (Connection timed out) in headers. Giving up. ------------------------------------------------------------------ Meanwhile, the SSH server reports: ------------------------------------------------------------------ debug1: Connection to port 8888 forwarding to localhost port 0 requested. debug2: fd 9 setting TCP_NODELAY debug2: fd 9 setting O_NONBLOCK debug3: fd 9 is O_NONBLOCK debug1: channel 3: new [forwarded-tcpip] debug3: send packet: type 90 debug3: receive packet: type 91 debug2: channel 3: open confirm rwindow 2097152 rmax 32768 debug2: channel 3: read failed rfd 9 maxlen 32768: Broken pipe debug2: channel 3: read failed debug2: chan_shutdown_read: channel 3: (i0 o0 sock 9 wfd 9 efd -1 [closed]) debug2: channel 3: input open -> drain debug2: channel 3: ibuf empty debug2: channel 3: send eof debug3: send packet: type 96 debug2: channel 3: input drain -> closed ------------------------------------------------------------------ If I then exit from the interactive shell, I get: ------------------------------------------------------------------ $ exit logout debug3: receive packet: type 96 debug2: channel 0: rcvd eof debug2: channel 0: output open -> drain debug2: channel 0: obuf empty debug2: chan_shutdown_write: channel 0: (i0 o1 sock -1 wfd 5 efd 6 [write]) debug2: channel 0: output drain -> closed debug3: receive packet: type 98 debug1: client_input_channel_req: channel 0 rtype exit-status reply 0 debug3: receive packet: type 98 debug1: client_input_channel_req: channel 0 rtype eow at openssh.com reply 0 debug2: channel 0: rcvd eow debug2: chan_shutdown_read: channel 0: (i0 o3 sock -1 wfd 4 efd 6 [write]) debug2: channel 0: input open -> closed debug3: receive packet: type 97 debug2: channel 0: rcvd close debug3: channel 0: will not send data after close debug2: channel 0: almost dead debug2: channel 0: gc: notify user debug2: channel 0: gc: user detached debug2: channel 0: send close debug3: send packet: type 97 debug2: channel 0: is dead debug2: channel 0: garbage collecting debug1: channel 0: free: client-session, nchannels 2 debug3: channel 0: status: The following connections are open: #0 client-session (t4 r2 i3/0 o3/0 e[write]/0 fd -1/-1/6 sock -1 cc -1 io 0x00/0x00) #1 127.0.0.1 (t4 r3 i0/0 o3/0 e[closed]/0 fd 7/7/-1 sock 7 cc -1 io 0x00/0x00) ------------------------------------------------------------------ The SSH client is hung at this point. The server reports: ------------------------------------------------------------------ debug2: channel 2: read failed rfd 10 maxlen 32768: Input/output error debug2: channel 2: read failed debug2: chan_shutdown_read: channel 2: (i0 o0 sock -1 wfd 10 efd -1 [closed]) debug2: channel 2: input open -> drain debug2: channel 2: ibuf empty debug2: channel 2: send eof debug3: send packet: type 96 debug2: channel 2: input drain -> closed debug1: Received SIGCHLD. debug1: session_by_pid: pid 1840470 debug1: session_exit_message: session 0 channel 2 pid 1840470 debug2: channel 2: request exit-status confirm 0 debug3: send packet: type 98 debug1: session_exit_message: release channel 2 debug2: channel 2: write failed debug2: chan_shutdown_write: channel 2: (i3 o0 sock -1 wfd 8 efd -1 [closed]) debug2: channel 2: send eow debug3: send packet: type 98 debug2: channel 2: output open -> closed debug3: mm_request_send: entering, type 30 debug2: channel 2: send close debug3: send packet: type 97 debug3: mm_request_receive: entering debug3: monitor_read: checking request 30 debug3: mm_answer_pty_cleanup: entering debug1: session_by_tty: session 0 tty /dev/pts/6 debug3: mm_session_close: session 0 pid 1840469 debug3: mm_session_close: tty /dev/pts/6 ptyfd 4 debug3: channel 2: will not send data after close debug1: session_pty_cleanup2: session 0 release /dev/pts/6 debug3: session_unused: session id 0 unused debug3: receive packet: type 97 debug2: channel 2: rcvd close debug3: channel 2: will not send data after close debug2: channel 2: is dead debug2: channel 2: gc: notify user debug1: session_by_channel: session 0 channel 2 debug1: session_close_by_channel: channel 2 child 0 Close session: user chickey from 127.0.0.1 port 46454 id 0 debug3: session_unused: session id 0 unused debug2: channel 2: gc: user detached debug2: channel 2: is dead debug2: channel 2: garbage collecting debug1: channel 2: free: server-session, nchannels 4 debug3: channel 2: status: The following connections are open: #2 server-session (t4 r0 i3/0 o3/0 e[closed]/0 fd -1/-1/-1 sock -1 cc -1 io 0x00/0x00) #3 forwarded-tcpip: listening port 8888 for localhost port 0, connect from 127.0.0.1 port 54490 to 127.0.0.1 port 8888 (t4 r1 i3/0 o0/0 e[closed]/0 fd 9/9/-1 sock 9 cc -1 io 0x00/0x00) ------------------------------------------------------------------ Note that upon logout, there is nothing about the port-forwarding channel (#1 on the client and #3 on the server) other than the final reports. Logging out from the SSH session's interactive shell has no effect on the forwarding channel, which is already hung by that point.> IMO the root problem here is that channels in state SSH_CHANNEL_CONNECTING > have no timeout unless there system's TCP stack implements one. Maybe > OpenSSH should implement something conservative here.If there's an issue there, then I think that is separate from the trouble I'm having; as noted above, the channels in my case go into a fully open state.> I do notice some different behaviour between Linux (above) and OpenBSD. > On OpenBSD the connection is accepted but obviously does not pass any > data (of course).This actually sounds exactly like what I'm seeing on Linux. $ uname -a Linux corey 5.18.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.5-1 (2022-06-16) x86_64 GNU/Linux> This is harder to fix without the side effects I > mentioned above, e.g. consider a TCP client program that connects to a > forwarded socket, sends a message and exits without waiting for a reply. > I think setting c->force_drain in this case could cause the message to be > lost (though I'm not 100% sure).From channel_input_ieof(): ------------------------------------------------------------------ if (c->force_drain && c->istate == CHAN_INPUT_OPEN) { debug("channel %d: FORCE input drain", c->self); c->istate = CHAN_INPUT_WAIT_DRAIN; if (sshbuf_len(c->input) == 0) chan_ibuf_empty(ssh, c); } ------------------------------------------------------------------ And chan_ibuf_empty() is: ------------------------------------------------------------------ chan_ibuf_empty(struct ssh *ssh, Channel *c) { debug2("channel %d: ibuf empty", c->self); if (sshbuf_len(c->input)) { error("channel %d: chan_ibuf_empty for non empty buffer", c->self); return; } switch (c->istate) { case CHAN_INPUT_WAIT_DRAIN: if (!(c->flags & (CHAN_CLOSE_SENT|CHAN_LOCAL))) chan_send_eof2(ssh, c); chan_set_istate(c, CHAN_INPUT_CLOSED); break; default: error("channel %d: chan_ibuf_empty for istate %d", c->self, c->istate); break; } } ------------------------------------------------------------------ I don't know the full ramifications of that. It looks like there are two immediate effects: 1. The channel input goes into a draining state. 2. If there is no buffered input, the channel input goes into a closed state. For the case where there is no buffered input, that seems like it would be ok to me, from a layman's perspective: no harm in closing something when all data has been consumed. If there is any buffered input, though, I don't know what ends up happening later in the state machine. Thanks again for your attention here. I'm figuring out what I can, but I'm not familiar with the code or architecture. -Corey
Corey Hickey
2022-Sep-20 00:41 UTC
TCP Forwarding hangs when TCP service is unresponsive, even when TCP client exits
On 2022-09-19 01:55, Damien Miller wrote:> This is kind of a tricky case, because for some cases it's AFAIK impossible > for the client to discern between a TCP server that a) will never respond > from b) hasn't responded *yet*. > > The solution that you proposed is unfortunately not without side > effects - I think it changes the behaviour of half-closed TCP connection > in a way that might lose data.[...]> I do notice some different behaviour between Linux (above) and OpenBSD. > On OpenBSD the connection is accepted but obviously does not pass any > data (of course). This is harder to fix without the side effects I > mentioned above, e.g. consider a TCP client program that connects to a > forwarded socket, sends a message and exits without waiting for a reply. > I think setting c->force_drain in this case could cause the message to be > lost (though I'm not 100% sure).I wrote a test TCP client/server, with the design that the TCP client would send a message and then exit while the server was sleeping. With my patch applied, the SSH client and SSH server both exited while the TCP server was sleeping, yet the TCP server did still receive the message once it woke up. This was presumably because the kernel already had the message stored in a socket buffer. Still, I've been thinking and reading about this more, and you're probably right that my change would break things for some edge cases, particularly for half-closed TCP connections that are deliberately persistent. https://superuser.com/questions/298919/what-is-tcp-half-open-connection-and-tcp-half-closed-connection For a while I was thinking that the SSH server (for the remote-forwarding case) ought to simply close the channel when its children have exited, but the trouble there is that clients of the TCP forwarding need not be children of SSH, so I don't think that is viable. As a workaround external to SSH itself, I am considering making my remote wrapper script kill its parent SSH server process once the TCP client (puppet, in my case) has exited. I don't know how to guarantee that the channel carrying stdout/stderr has flushed all the data back to the SSH client, though; killing the SSH server process prematurely might lose in-flight data. If we don't have a better alternative, would it be acceptable to gate the proposed use of force_drain behind a new config option (default to disabled)? Thanks, Corey