I am running OpenSSH 2.9p1 on SunOS 5.7 w/4-24-2001 patch cluster. Like many other users I am seeing the hanging session on logout with background processes. This is a huge problem for me as I centrally manage 50+ machines with rdist across ssh. Instead of just complaining about the problem I thought I would put my CS degree to use and try to track down the problem myself. For starters, though, can someone point me in the right direction? Also, is there a code rodemap for OpenSSH? Thanks! -Dan
On Fri, 4 May 2001, Daniel David Benson wrote:> > I am running OpenSSH 2.9p1 on SunOS 5.7 w/4-24-2001 patch cluster. > Like many other users I am seeing the hanging session on logout > with background processes. This is a huge problem for me as > I centrally manage 50+ machines with rdist across ssh. > Instead of just complaining about the problem I thought I would > put my CS degree to use and try to track down the problem myself. > For starters, though, can someone point me in the right direction?This is the best description of the problem, pinched from Redhat: About the hang-on-exit bug: this is the TODO item which shows up when you run "ssh server 'sleep 20 & exit'". * The shell starts up, and starts its own session. As a side-effect, it gets its own process group. * The child forks off sleep, and because it's in the background, puts it into its own process group. The sleep command inherits a copy of the shell's descriptor for the tty as its stdout. * The shell exits, but doesn't SIGHUP all of its child PIDs like it probably should. * The sshd server attempts to read from the master side of the pty, and while there are still process with the pty open, no EOF is produced. * The sleep command exits, closes its descriptor, sshd detects the EOF, and the connection gets closed. Attempts at fixing this in sshd, and why they don't work: * SIGHUP the sshd's process group. - The shell is in its own process group. * Track process group IDs of all children before we reap them (via an extra field in Session structures which holds the pgid for each child pid), and SIGHUP the pgid when we reap. - Background commands are in yet another process group. * Close the connection when the child dies. - Background commands may need to write data to the connection. Also prematurely truncates output from some commands (scp server, the famous "dd if=/dev/zero bs=1000 count=100" case). Known-good workarounds: * bash: shopt huponexit on * tcsh: none * zsh: ? * pdksh: ? This appears to affect rsh as well: it behaves the same with 'sleep 20 & exit'. -- | Damien Miller <djm at mindrot.org> \ ``E-mail attachments are the poor man's | http://www.mindrot.org / distributed filesystem'' - Dan Geer
As is well known, current versions of openssh hang upon exit when background processes exist. If these processes do not produce output to stdout or stderr they should be allowed to continue to run silently. (If they do try to produce output, they will be killed by the shell.) This would be consistent with the behaviour of rsh, ssh, rlogin, telnet, csh, and bash. In no case should openssh wait around for them indefinitely. Ssh is supposed to be a secure implementation of rsh and openssh is supposed to be a open source version of ssh, so despite a few suggestions to the contrary, this *really* is a bug. The following patch to openssh-2.9p1 fixes the problem. This patch has now been thoroughly tested and is believed not to break ssh or scp, unlike previous related attempts. I hope this patch is helpful, -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c --- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001 +++ openssh-2.9p1J/clientloop.c Wed May 2 16:21:16 2001 @@ -440,9 +440,13 @@ len = read(connection_in, buf, sizeof(buf)); if (len == 0) { /* Received EOF. The remote host has closed the connection. */ - snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n", - host); - buffer_append(&stderr_buffer, buf, strlen(buf)); +/* + * This message duplicates the one already in client_loop(). + * + * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n", + * host); + * buffer_append(&stderr_buffer, buf, strlen(buf)); + */ quit_pending = 1; return; } diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c --- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001 +++ openssh-2.9p1J/nchan.c Wed May 2 16:19:11 2001 @@ -56,7 +56,7 @@ /* helper */ static void chan_shutdown_write(Channel *c); -static void chan_shutdown_read(Channel *c); +void chan_shutdown_read(Channel *c); /* * SSH1 specific implementation of event functions @@ -479,7 +479,7 @@ c->wfd = -1; } } -static void +void chan_shutdown_read(Channel *c) { if (compat20 && c->type == SSH_CHANNEL_LARVAL) diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h --- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001 +++ openssh-2.9p1J/nchan.h Wed May 2 16:19:11 2001 @@ -88,4 +88,5 @@ void chan_init_iostates(Channel * c); void chan_init(void); +void chan_shutdown_read(Channel *c); #endif diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c --- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001 +++ openssh-2.9p1J/session.c Wed May 2 16:20:04 2001 @@ -1960,6 +1960,8 @@ */ if (c->ostate != CHAN_OUTPUT_CLOSED) chan_write_failed(c); + if (c->istate != CHAN_INPUT_CLOSED) + chan_shutdown_read(c); s->chanid = -1; }
hi, i think this patch can lead to data loss. please tell me if you experience this. -m
If this is a feature, not a bug, then, my (stupid?) question(s) are this: 1. Telnet doesn't have the same problem. (yes, telnet isn't exactly the same thing, but... this is related to what John Bowman's patch does) 2. F-secure SSH doesn't have the same problem. Also, I believe that this problem was attempted to workedaround sometime in 2.3.0p1 timeframe by if the connection was closed, ssh will close and exit immediately (don't quote me on this, this info was gleaned through observation, not reading the actual code). However, this triggered the unfortunate bug in that: ssh myserver echo 0 will not actually print anything out, because the close and exit was too soon. I am not a pty expert, but I wonder how f-secure ssh managed to get around this issue (it doesn't have either problems). -rchit -----Original Message----- From: Jason Stone [mailto:jason at shalott.net] Sent: Saturday, May 05, 2001 4:54 AM To: openssh-unix-dev at mindrot.org Subject: Re: SSH connection hanging on logout -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1> About the hang-on-exit bug: this is the TODO item which shows up when you > run "ssh server 'sleep 20 & exit'". > > * The shell starts up, and starts its own session. As a side-effect, it > gets its own process group. > * The sshd server attempts to read from the master side of the pty, and > while there are still process with the pty open, no EOF is produced. > * The sleep command exits, closes its descriptor, sshd detects the EOF,and> the connection gets closed.Or, put another way, this is a feature, not a bug - sshd has no way of knowing that "sleep 20" isn't going to eventually produce some output that you'll want to see, so it stays alive until the background command exits. The real "bug" is users trying to use the shell's '&' builtin to run daemon processes. If you want a command to really be backgrounded (ie, to daemonize), use something other than '&', something that will make the command close the pty and either start its own process group or else become a child of init. Eg: perl -e 'fork && exit; close STDIN; close STDOUT; close STDERR; \ setpgrp(0,$$); exec "sleep 20";' (Watch out for the quoting if you try this on the commandline....)> Known-good workarounds: > * bash: shopt huponexit on > * tcsh: none* zsh: setopt HUP (this is usually the default) If you use zsh, you might also try something like this in your .zshrc: daemonize(){ COMMAND="$@" perl -e 'fork && exit; close STDIN; close STDOUT; close STDERR; \ setpgrp(0,$$); exec "'$COMMAND'";' } } You would then run "daemonize sleep 20" and the sleep 20 would be run in the background and not hang the sshd when you exit. This will almost certainly work in other bourne-compatible shells as well. - -Jason -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (FreeBSD) Comment: See https://private.idealab.com/public/jason/jason.gpg iD8DBQE68+nhswXMWWtptckRAjaVAJ0bbN7PPe0jLC80SPZjDNAvBFuC2wCaA4ep 1IteXaTPMxe2TsKrsLmg20A=mEVt -----END PGP SIGNATURE-----
Although still no instances of data loss have been reported with the patch I posted to this list on 2001-05-08, I have now noticed one inconsistency with the handling of X connections when the patch is applied to openssh-2.9p1 that I thought I should report: Without the patch the following will hang (just as any other process will): ssh host xclock & exit With the patch the ssh connection closes immediately, without waiting for the X application to terminate. This does not seem to be desirable; suppose the process had been an emacs or netscape session. Is it possible to modify the patch so that it will wait for unclosed X sessions to terminate (but not hang on other processes), just as the commercial version of SSH does? -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman
Here is a new version of the hang-on-exit patch, which: 1. fixes the hang-on-exit bug (without data loss); 2. does not exit if there are unterminated X applications; 3. exits the session when all X applications have closed. Of these three tests, Openssh-2.9p1 only passes the second one. The third one is another type of hanging bug in Openssh, as is demonstrated by the following test: ssh host xterm -e sleep 20 & exit Even after the xsession terminates, the ssh session is left hanging forever. The correct behaviour is to wait 20 seconds for the X application to close and then exit. -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c --- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001 +++ openssh-2.9p1J/channels.c Mon May 14 20:51:14 2001 @@ -1137,6 +1137,10 @@ continue; if (ftab[c->type] == NULL) continue; + if(c->type == SSH_CHANNEL_OPEN && c->rfd == -1) { + c->type = SSH_CHANNEL_FREE; + continue; + } (*ftab[c->type])(c, readset, writeset); if (chan_is_dead(c)) { /* @@ -1639,6 +1643,47 @@ for (i = 0; i < channels_alloc; i++) if (channels[i].type != SSH_CHANNEL_FREE) channel_close_fds(&channels[i]); +} + +/* Returns true if session is inactive. */ + +int +channel_inactive_session() +{ + u_int i; + if(channels_alloc == 0) return 0; + + for (i = 0; i < channels_alloc; i++) { + switch (channels[i].type) { + case SSH_CHANNEL_FREE: + case SSH_CHANNEL_X11_LISTENER: + case SSH_CHANNEL_CLOSED: + break; + case SSH_CHANNEL_PORT_LISTENER: + case SSH_CHANNEL_RPORT_LISTENER: + case SSH_CHANNEL_AUTH_SOCKET: + case SSH_CHANNEL_DYNAMIC: + case SSH_CHANNEL_CONNECTING: /* XXX ??? */ + return 0; + case SSH_CHANNEL_LARVAL: + if (!compat20) + fatal("cannot happen: SSH_CHANNEL_LARVAL"); + return 0; + case SSH_CHANNEL_OPENING: + case SSH_CHANNEL_OPEN: + case SSH_CHANNEL_X11_OPEN: + return 0; + case SSH_CHANNEL_INPUT_DRAINING: + case SSH_CHANNEL_OUTPUT_DRAINING: + if (!compat13) + fatal("cannot happen: OUT_DRAIN"); + return 0; + default: + fatal("channel_inactive_session: bad channel type %d", channels[i].type); + /* NOTREACHED */ + } + } + return 1; } /* Returns true if any channel is still open. */ diff -ur openssh-2.9p1/channels.h openssh-2.9p1J/channels.h --- openssh-2.9p1/channels.h Fri Apr 13 17:28:02 2001 +++ openssh-2.9p1J/channels.h Mon May 14 20:51:14 2001 @@ -197,6 +197,9 @@ */ void channel_close_all(void); +/* Returns true if session is inactive. */ +int channel_inactive_session(); + /* Returns true if there is still an open channel over the connection. */ int channel_still_open(void); diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c --- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001 +++ openssh-2.9p1J/clientloop.c Mon May 14 20:51:14 2001 @@ -440,9 +440,13 @@ len = read(connection_in, buf, sizeof(buf)); if (len == 0) { /* Received EOF. The remote host has closed the connection. */ - snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n", - host); - buffer_append(&stderr_buffer, buf, strlen(buf)); +/* + * This message duplicates the one already in client_loop(). + * + * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n", + * host); + * buffer_append(&stderr_buffer, buf, strlen(buf)); + */ quit_pending = 1; return; } diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c --- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001 +++ openssh-2.9p1J/nchan.c Mon May 14 20:51:14 2001 @@ -56,7 +56,7 @@ /* helper */ static void chan_shutdown_write(Channel *c); -static void chan_shutdown_read(Channel *c); +void chan_shutdown_read(Channel *c); /* * SSH1 specific implementation of event functions @@ -479,7 +479,7 @@ c->wfd = -1; } } -static void +void chan_shutdown_read(Channel *c) { if (compat20 && c->type == SSH_CHANNEL_LARVAL) diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h --- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001 +++ openssh-2.9p1J/nchan.h Mon May 14 20:51:14 2001 @@ -88,4 +88,5 @@ void chan_init_iostates(Channel * c); void chan_init(void); +void chan_shutdown_read(Channel *c); #endif diff -ur openssh-2.9p1/serverloop.c openssh-2.9p1J/serverloop.c --- openssh-2.9p1/serverloop.c Fri Apr 13 17:28:03 2001 +++ openssh-2.9p1J/serverloop.c Mon May 14 20:51:14 2001 @@ -726,7 +726,7 @@ if (!rekeying) channel_after_select(readset, writeset); process_input(readset); - if (connection_closed) + if (connection_closed || channel_inactive_session()) break; process_output(writeset); } diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c --- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001 +++ openssh-2.9p1J/session.c Mon May 14 20:51:14 2001 @@ -1960,6 +1960,9 @@ */ if (c->ostate != CHAN_OUTPUT_CLOSED) chan_write_failed(c); + if (c->istate == CHAN_INPUT_OPEN && compat20) { + chan_shutdown_read(c); + } s->chanid = -1; }
Disregard my previous message...that isn't the right patch....I'm still testing a new one... -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman
The following is a CORRECTION, with a REVISED PATCH, to my message posted to this list on 2001-05-15 2:55:37. Here is a new version of the hang-on-exit patch (2001-05-08 23:52:24), which: 1. fixes the hang-on-exit bug under Protocol 2 (without data loss); 2. does not exit if there are unterminated X applications; 3. exits the session when all X applications have closed. Of these three tests, Openssh-2.9p1 under Protocol 2 passes only the second one. The third item is another type of hanging bug in Openssh, as is demonstrated by the following test: ssh -2 host xterm -e sleep 20 & exit Even after the xsession terminates, the ssh session is left hanging forever. The correct behaviour is to wait 20 seconds for the X application to close and then exit. -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman P.S. Since the hang-on-exit patch is only effective under Protocol 2, a conditional to the call to chan_shutdown_read() has been added. diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c --- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001 +++ openssh-2.9p1J/channels.c Wed May 16 01:22:16 2001 @@ -333,6 +333,9 @@ xfree(c->remote_name); c->remote_name = NULL; } + + if(channel_find_open() == -1) + shutdown(packet_get_connection_out(), SHUT_RDWR); } /* @@ -1137,6 +1140,15 @@ continue; if (ftab[c->type] == NULL) continue; + if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) { + int type=c->type; + c->type=SSH_CHANNEL_CLOSED; + if(channel_find_open() == -1) + shutdown(packet_get_connection_out(), + SHUT_RDWR); + c->type=type; + continue; + } (*ftab[c->type])(c, readset, writeset); if (chan_is_dead(c)) { /* diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c --- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001 +++ openssh-2.9p1J/clientloop.c Wed May 16 01:22:16 2001 @@ -440,9 +440,13 @@ len = read(connection_in, buf, sizeof(buf)); if (len == 0) { /* Received EOF. The remote host has closed the connection. */ - snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n", - host); - buffer_append(&stderr_buffer, buf, strlen(buf)); +/* + * This message duplicates the one already in client_loop(). + * + * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n", + * host); + * buffer_append(&stderr_buffer, buf, strlen(buf)); + */ quit_pending = 1; return; } diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c --- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001 +++ openssh-2.9p1J/nchan.c Wed May 16 01:22:16 2001 @@ -56,7 +56,7 @@ /* helper */ static void chan_shutdown_write(Channel *c); -static void chan_shutdown_read(Channel *c); +void chan_shutdown_read(Channel *c); /* * SSH1 specific implementation of event functions @@ -479,7 +479,7 @@ c->wfd = -1; } } -static void +void chan_shutdown_read(Channel *c) { if (compat20 && c->type == SSH_CHANNEL_LARVAL) diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h --- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001 +++ openssh-2.9p1J/nchan.h Wed May 16 01:22:16 2001 @@ -88,4 +88,5 @@ void chan_init_iostates(Channel * c); void chan_init(void); +void chan_shutdown_read(Channel *c); #endif diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c --- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001 +++ openssh-2.9p1J/session.c Wed May 16 02:05:12 2001 @@ -1960,6 +1960,9 @@ */ if (c->ostate != CHAN_OUTPUT_CLOSED) chan_write_failed(c); + if (c->istate != CHAN_INPUT_CLOSED && compat20) { + chan_shutdown_read(c); + } s->chanid = -1; }
Here is a perhaps a slightly more robust version (in case of internal errors; see chan_read_failed_12) of the hang-on-exit patch. In session.c, I've changed the line if (c->istate != CHAN_INPUT_CLOSED && compat20) { to if (c->istate == CHAN_INPUT_OPEN && compat20) { In practice this shouldn't make any difference, since c->istate should always equal either CHAN_INPUT_CLOSED or CHAN_INPUT_OPEN within session_exit_message. -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c --- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001 +++ openssh-2.9p1J/channels.c Wed May 16 01:22:16 2001 @@ -333,6 +333,9 @@ xfree(c->remote_name); c->remote_name = NULL; } + + if(channel_find_open() == -1) + shutdown(packet_get_connection_out(), SHUT_RDWR); } /* @@ -1137,6 +1140,15 @@ continue; if (ftab[c->type] == NULL) continue; + if(c->istate == CHAN_INPUT_OPEN && c->rfd == -1) { + int type=c->type; + c->type=SSH_CHANNEL_CLOSED; + if(channel_find_open() == -1) + shutdown(packet_get_connection_out(), + SHUT_RDWR); + c->type=type; + continue; + } (*ftab[c->type])(c, readset, writeset); if (chan_is_dead(c)) { /* diff -ur openssh-2.9p1/clientloop.c openssh-2.9p1J/clientloop.c --- openssh-2.9p1/clientloop.c Fri Apr 20 06:50:51 2001 +++ openssh-2.9p1J/clientloop.c Wed May 16 01:22:16 2001 @@ -440,9 +440,13 @@ len = read(connection_in, buf, sizeof(buf)); if (len == 0) { /* Received EOF. The remote host has closed the connection. */ - snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n", - host); - buffer_append(&stderr_buffer, buf, strlen(buf)); +/* + * This message duplicates the one already in client_loop(). + * + * snprintf(buf, sizeof buf, "Connection to %.300s closed by remote host.\r\n", + * host); + * buffer_append(&stderr_buffer, buf, strlen(buf)); + */ quit_pending = 1; return; } diff -ur openssh-2.9p1/nchan.c openssh-2.9p1J/nchan.c --- openssh-2.9p1/nchan.c Tue Apr 3 07:02:48 2001 +++ openssh-2.9p1J/nchan.c Wed May 16 01:22:16 2001 @@ -56,7 +56,7 @@ /* helper */ static void chan_shutdown_write(Channel *c); -static void chan_shutdown_read(Channel *c); +void chan_shutdown_read(Channel *c); /* * SSH1 specific implementation of event functions @@ -479,7 +479,7 @@ c->wfd = -1; } } -static void +void chan_shutdown_read(Channel *c) { if (compat20 && c->type == SSH_CHANNEL_LARVAL) diff -ur openssh-2.9p1/nchan.h openssh-2.9p1J/nchan.h --- openssh-2.9p1/nchan.h Sun Mar 4 23:16:12 2001 +++ openssh-2.9p1J/nchan.h Wed May 16 01:22:16 2001 @@ -88,4 +88,5 @@ void chan_init_iostates(Channel * c); void chan_init(void); +void chan_shutdown_read(Channel *c); #endif diff -ur openssh-2.9p1/session.c openssh-2.9p1J/session.c --- openssh-2.9p1/session.c Wed Apr 18 09:29:34 2001 +++ openssh-2.9p1J/session.c Wed May 16 11:25:17 2001 @@ -1960,6 +1960,9 @@ */ if (c->ostate != CHAN_OUTPUT_CLOSED) chan_write_failed(c); + if (c->istate == CHAN_INPUT_OPEN && compat20) { + chan_shutdown_read(c); + } s->chanid = -1; }
On Wed, May 16, 2001 at 08:18:03AM -0000, John Bowman wrote:> The third item is another type of hanging bug in Openssh, as is > demonstrated by the following test: > > ssh -2 host > xterm -e sleep 20 & > exit > > Even after the xsession terminates, the ssh session is left hanging forever. > The correct behaviour is to wait 20 seconds for the X application to close > and then exit.this is a client bug. try this: Index: clientloop.c ==================================================================RCS file: /home/markus/cvs/ssh/clientloop.c,v retrieving revision 1.70 diff -u -r1.70 clientloop.c --- clientloop.c 2001/05/11 14:59:55 1.70 +++ clientloop.c 2001/05/16 20:31:44 @@ -346,7 +346,13 @@ if (buffer_len(&stderr_buffer) > 0) FD_SET(fileno(stderr), *writesetp); } else { - FD_SET(connection_in, *readsetp); + /* channel_prepare_select could have closed the last channel */ + if (session_closed && !channel_still_open()) { + if (!packet_have_data_to_write()) + return; + } else { + FD_SET(connection_in, *readsetp); + } } /* Select server connection if have data to write to the server. */
On Wed, May 16, 2001 at 08:18:03AM -0000, John Bowman wrote:> The following is a CORRECTION, with a REVISED PATCH, to my message posted > to this list on 2001-05-15 2:55:37. > > Here is a new version of the hang-on-exit patch (2001-05-08 23:52:24), which: > > 1. fixes the hang-on-exit bug under Protocol 2 (without data loss); > 2. does not exit if there are unterminated X applications; > 3. exits the session when all X applications have closed. > > Of these three tests, Openssh-2.9p1 under Protocol 2 passes only the second > one. The third item is another type of hanging bug in Openssh, as is > demonstrated by the following test: > > ssh -2 host > xterm -e sleep 20 & > exit > > Even after the xsession terminates, the ssh session is left hanging forever. > The correct behaviour is to wait 20 seconds for the X application to close > and then exit. > > -- John Bowman > > University of Alberta > http://www.math.ualberta.ca/~bowman > > P.S. Since the hang-on-exit patch is only effective under Protocol 2, > a conditional to the call to chan_shutdown_read() has been added. > > > > diff -ur openssh-2.9p1/channels.c openssh-2.9p1J/channels.c > --- openssh-2.9p1/channels.c Tue Apr 17 12:14:35 2001 > +++ openssh-2.9p1J/channels.c Wed May 16 01:22:16 2001 > @@ -333,6 +333,9 @@ > xfree(c->remote_name); > c->remote_name = NULL; > } > + > + if(channel_find_open() == -1) > + shutdown(packet_get_connection_out(), SHUT_RDWR); > continue;> + if(channel_find_open() == -1) > + shutdown(packet_get_connection_out(), > + SHUT_RDWR);imho, this is wrong. you are not allowed to shutdown the TCP connection to the peer. the peer can still request a second shell sessions.
On Sun, May 13, 2001 at 06:44:53PM -0000, John Bowman wrote:> Although still no instances of data loss have been reported with the patchyou should check this: ssh localhost -2 -v -v -v -p 1234 dd if=/bsd bs=65536 count=2 | \ (sleep 10; md5sum) on my machine the remote command dies, but sshd still calls read 3 more times on rfd. this should not lead to data corruption, i.e. the checksums must match dd if=/bsd bs=65536 count=2 | md5sum -m use this patch if you want to trace the reads from rfd. Index: channels.c ==================================================================RCS file: /home/markus/cvs/ssh/channels.c,v retrieving revision 1.115 diff -u -r1.115 channels.c --- channels.c 2001/05/09 22:51:57 1.115 +++ channels.c 2001/05/16 21:52:30 @@ -920,6 +920,7 @@ chan_read_failed(c); } } else { + debug3("channel %d: read rfd %d len %d", c->self, c->rfd, len); buffer_append(&c->input, buf, len); } } @@ -1029,9 +1031,10 @@ packet_put_int(c->remote_id); packet_put_int(c->local_consumed); packet_send(); - debug2("channel %d: window %d sent adjust %d", + debug2("channel %d: window %d sent adjust %d (obuf %d)", c->self, c->local_window, - c->local_consumed); + c->local_consumed, + buffer_len(&c->output)); c->local_window += c->local_consumed; c->local_consumed = 0; } @@ -1270,6 +1273,7 @@ } } if (len > 0) { + debug3("channel %d: channel data: %d", c->self, len); packet_start(compat20 ? SSH2_MSG_CHANNEL_DATA : SSH_MSG_CHANNEL_DATA); packet_put_int(c->remote_id);
Under linux there is no data corruption and the checksums match: [wizard: ~] ssh localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum ) 2+0 records in 2+0 records out 86d34e869a31df51922ad2bb9bd202bc - [wizard: ~] dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum ) 2+0 records in 2+0 records out 86d34e869a31df51922ad2bb9bd202bc - -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman
On Wed, May 16, 2001 at 10:14:57PM -0000, John Bowman wrote:> Under linux there is no data corruption and the checksums match: > > [wizard: ~] ssh localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum ) > 2+0 records in > 2+0 records out > 86d34e869a31df51922ad2bb9bd202bc - > [wizard: ~] dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum ) > 2+0 records in > 2+0 records out > 86d34e869a31df51922ad2bb9bd202bc -with my debugging patch, you should see something like this on the sshd side: debug3: channel 0: channel data: 16384 debug3: channel 0: read rfd 10 len 16384 debug3: channel 0: channel data: 15907 debug2: channel 0: rcvd adjust 16861 debug3: channel 0: channel data: 477 debug3: channel 0: read rfd 10 len 16384 debug3: channel 0: channel data: 16384 debug2: channel 0: rcvd adjust 65536 debug3: channel 0: read rfd 10 len 16384 debug3: channel 0: channel data: 16384 debug3: channel 0: read rfd 10 len 16384 debug3: channel 0: channel data: 16384 debug1: Received SIGCHLD. ^^ shell dies debug1: session_by_pid: pid 29873 debug1: session_exit_message: session 0 channel 0 pid 29873 debug1: session_exit_message: release channel 0 debug1: channel 0: write failed debug1: channel 0: output open -> closed debug1: channel 0: close_write debug1: session_free: session 0 pid 29873 debug3: channel 0: read rfd 10 len 16384 ^^ more reads from the shell. if you shutdown at the SIGCHLD, you can no longer read at this point! debug2: channel 0: read 84 from efd 12 debug3: channel 0: channel data: 16384 debug2: channel 0: rwin 16384 elen 84 euse 1 debug2: channel 0: sent ext data 84 debug1: channel 0: read<=0 rfd 10 len 0 debug1: channel 0: read failed debug1: channel 0: input open -> drain debug1: channel 0: close_read debug1: channel 0: input: no drain shortcut debug1: channel 0: ibuf empty debug1: channel 0: input drain -> closed debug1: channel 0: send eof
> > Under linux there is no data corruption and the checksums match: > > > > [wizard: ~] ssh localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum ) > > 2+0 records in > > 2+0 records out > > 86d34e869a31df51922ad2bb9bd202bc - > > [wizard: ~] dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum ) > > 2+0 records in > > 2+0 records out > > 86d34e869a31df51922ad2bb9bd202bc - > > with my debugging patch, > you should see something like this on the sshd side: > > debug3: channel 0: channel data: 16384 > debug3: channel 0: read rfd 10 len 16384 > debug3: channel 0: channel data: 15907 > debug2: channel 0: rcvd adjust 16861 > debug3: channel 0: channel data: 477 > debug3: channel 0: read rfd 10 len 16384 > debug3: channel 0: channel data: 16384 > debug2: channel 0: rcvd adjust 65536 > debug3: channel 0: read rfd 10 len 16384 > debug3: channel 0: channel data: 16384 > debug3: channel 0: read rfd 10 len 16384 > debug3: channel 0: channel data: 16384 > debug1: Received SIGCHLD. > ^^ shell dies > debug1: session_by_pid: pid 29873 > debug1: session_exit_message: session 0 channel 0 pid 29873 > debug1: session_exit_message: release channel 0 > debug1: channel 0: write failed > debug1: channel 0: output open -> closed > debug1: channel 0: close_write > debug1: session_free: session 0 pid 29873 > debug3: channel 0: read rfd 10 len 16384 > ^^ more reads from the shell. > > if you shutdown at the SIGCHLD, you can no longer read > at this point! > > debug2: channel 0: read 84 from efd 12 > debug3: channel 0: channel data: 16384 > debug2: channel 0: rwin 16384 elen 84 euse 1 > debug2: channel 0: sent ext data 84 > debug1: channel 0: read<=0 rfd 10 len 0 > debug1: channel 0: read failed > debug1: channel 0: input open -> drain > debug1: channel 0: close_read > debug1: channel 0: input: no drain shortcut > debug1: channel 0: ibuf empty > debug1: channel 0: input drain -> closed > debug1: channel 0: send eof >Here is what I get with the latest patch and your debug patch installed. There is a SIGCHLD, but only after the very beginning: ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum ) ... debug2: channel 0: written 477 to efd 6 debug2: channel 0: rcvd ext data 27 debug1: Received SIGCHLD. debug2: channel 0: written 27 to efd 6 debug2: channel 0: window 32264 sent adjust 4600 (obuf 28672) debug1: client_input_channel_req: channel 0 rtype exit-status reply 0 debug2: channel 0: window 0 sent adjust 4096 (obuf 61440) debug2: channel 0: window 4096 sent adjust 4096 (obuf 57344) debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248) debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152) debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056) debug2: channel 0: window 20480 sent adjust 4096 (obuf 40960) debug2: channel 0: window 24576 sent adjust 4096 (obuf 36864) debug2: channel 0: window 28672 sent adjust 4096 (obuf 32768) debug2: channel 0: window 20480 sent adjust 36864 (obuf 8192) debug2: channel 0: rcvd ext data 31 debug2: channel 0: window 24545 sent adjust 28672 (obuf 12288) debug1: channel 0: rcvd eof debug1: channel 0: output open -> drain debug1: channel 0: rcvd close debug1: channel 0: input open -> closed debug1: channel 0: close_read debug2: channel 0: no data after CLOSE debug2: channel 0: no data after CLOSE debug2: channel 0: no data after CLOSE debug2: channel 0: no data after CLOSE debug2: channel 0: no data after CLOSE debug1: channel 0: obuf empty debug1: channel 0: output drain -> closed debug1: channel 0: close_write debug2: channel 0: active efd: 6 len 31 type write 2+0 records in 2+0 records out debug2: channel 0: written 31 to efd 6 debug1: channel 0: send close debug1: channel 0: is dead debug1: channel_free: channel 0: status: The following connections are open: #0 client-session (t4 r0 i8/0 o128/0 fd -1/-1) debug1: channel_free: channel 0: dettaching channel user debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 6.1 seconds debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0 debug1: Exit status 0 86d34e869a31df51922ad2bb9bd202bc - [wizard: ~] dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum ) 2+0 records in 2+0 records out 86d34e869a31df51922ad2bb9bd202bc - With 10 counts and a short sleep it looks like this: ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=10 | ( sleep 5 ; md5sum ) ... debug2: channel 0: written 477 to efd 6 debug2: channel 0: rcvd ext data 27 debug1: Received SIGCHLD. debug2: channel 0: written 27 to efd 6 debug2: channel 0: window 32264 sent adjust 4600 (obuf 28672) debug2: channel 0: window 0 sent adjust 4096 (obuf 61440) debug2: channel 0: window 4096 sent adjust 4096 (obuf 57344) debug2: channel 0: window 8192 sent adjust 4096 (obuf 53248) debug2: channel 0: window 12288 sent adjust 4096 (obuf 49152) debug2: channel 0: window 16384 sent adjust 4096 (obuf 45056) debug2: channel 0: window 20480 sent adjust 4096 (obuf 40960) debug2: channel 0: window 24576 sent adjust 4096 (obuf 36864) debug2: channel 0: window 28672 sent adjust 4096 (obuf 32768) debug2: channel 0: window 20480 sent adjust 36864 (obuf 8192) debug2: channel 0: window 24576 sent adjust 28672 (obuf 12288) debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288) debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288) debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288) debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288) debug2: channel 0: window 20480 sent adjust 32768 (obuf 12288) debug2: channel 0: window 21136 sent adjust 32768 (obuf 11632) debug2: channel 0: rcvd ext data 15 4+1 records in debug2: channel 0: written 15 to efd 6 debug2: channel 0: rcvd ext data 16 4+1 records out debug2: channel 0: written 16 to efd 6 debug1: client_input_channel_req: channel 0 rtype exit-status reply 0 debug1: channel 0: rcvd eof debug1: channel 0: output open -> drain debug1: channel 0: rcvd close debug1: channel 0: input open -> closed debug1: channel 0: close_read debug2: channel 0: no data after CLOSE debug1: channel 0: obuf empty debug1: channel 0: output drain -> closed debug1: channel 0: close_write debug1: channel 0: send close debug1: channel 0: is dead debug1: channel_free: channel 0: status: The following connections are open: #0 client-session (t4 r0 i8/0 o128/0 fd -1/-1) debug1: channel_free: channel 0: dettaching channel user debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 4.5 seconds debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0 debug1: Exit status 0 6c80ab2560a5f7b9b778b5498a93ece8 - [wizard: ~] dd if=/bin/bash bs=65536 count=10 | ( sleep 5 ; md5sum ) 4+1 records in 4+1 records out 6c80ab2560a5f7b9b778b5498a93ece8 - Looks ok to me. -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman
On Thu, May 17, 2001 at 03:35:18PM -0000, John Bowman wrote:> > you should see something like this on the sshd side: > > > > debug3: channel 0: channel data: 16384 > > debug3: channel 0: read rfd 10 len 16384 > > debug3: channel 0: channel data: 15907 > > debug2: channel 0: rcvd adjust 16861 > > debug3: channel 0: channel data: 477 > > debug3: channel 0: read rfd 10 len 16384 > > debug3: channel 0: channel data: 16384 > > debug2: channel 0: rcvd adjust 65536 > > debug3: channel 0: read rfd 10 len 16384 > > debug3: channel 0: channel data: 16384 > > debug3: channel 0: read rfd 10 len 16384 > > debug3: channel 0: channel data: 16384 > > debug1: Received SIGCHLD. > > ^^ shell dies > > debug1: session_by_pid: pid 29873 > > debug1: session_exit_message: session 0 channel 0 pid 29873 > > debug1: session_exit_message: release channel 0 > > debug1: channel 0: write failed > > debug1: channel 0: output open -> closed > > debug1: channel 0: close_write > > debug1: session_free: session 0 pid 29873 > > debug3: channel 0: read rfd 10 len 16384 > > ^^ more reads from the shell. > > > > if you shutdown at the SIGCHLD, you can no longer read > > at this point! > > > > debug2: channel 0: read 84 from efd 12 > > debug3: channel 0: channel data: 16384 > > debug2: channel 0: rwin 16384 elen 84 euse 1 > > debug2: channel 0: sent ext data 84 > > debug1: channel 0: read<=0 rfd 10 len 0 > > debug1: channel 0: read failed > > debug1: channel 0: input open -> drain > > debug1: channel 0: close_read > > debug1: channel 0: input: no drain shortcut > > debug1: channel 0: ibuf empty > > debug1: channel 0: input drain -> closed > > debug1: channel 0: send eof > > > > Here is what I get with the latest patch and your debug patch > installed. There is a SIGCHLD, but only after the very beginning: > > ssh -v -v -v localhost dd if=/bin/bash bs=65536 count=2 | ( sleep 10 ; md5sum )i need the server side LOG message!
Hi All... I ran into a hanging problem with 2.9p1 in the cygwin environment. I found that ssh -f localhost sleep 30 hangs on both 2.9p1 and 2.5p2. ssh -f -L 5901:localhost:5900 localhost sleep 30 works fine with 2.5.2p2 but hangs with 2.9p1. I tried all but the most recent patches you had on this thread, with no effect. Is this an example of one of the known types of hanging? If not, can you reproduce this case? In any event, ssh -f -L 5901:localhost:5900 localhost sleep 30 is what I am trying to use with 2.9p1 Thanks, ...Karl _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com
On Fri, May 18, 2001 at 12:35:43AM -0000, John Bowman wrote:> One can now request that the connection not die after the first TCP > connection is closed (via -N) or after a fixed number of seconds (via a sleep > command), but rather that it stays around n seconds after the most recent TCP > connection is closed. More details to follow...currently the connection is _not_ closed after the first connection
> From: Markus Friedl <markus.friedl at informatik.uni-erlangen.de>> ok, so just fyi: > dd if=/bsd bs=65536 count=2 > gets truncated on my openbsd development system. > > you have to get into this situtation: > > shell writes last block into pipe to sshd process. > shell dies > not all data has been read from the pipe. > > i can trigger this with > dd if=/bsd bs=65536 count=2 > > the figures should be different for other systems, but i think > all systems will show this problem.It depends on how pipes are implemented. The scenario you describe doesn't happen under Linux; the shell doesn't exit until all of the data has been read from the pipe. I suspect these differences in the way the shell and pipes interact are the underlying reason why you don't see the hang-on-exit bug at all on OpenBSD. Tweaking the parameters in your test doesn't make any difference on Linux, as demonstrated by the output of the script below. Changing localhost to another host (be sure to compare identical files) or working under different load average conditions does not affect the results. The patch has been subjected to exhaustive testing. Unless someone reports a case where it fails before the next release, please go ahead and include it in the next Linux version of OpenSSH. (If you don't like the -S option for some reason, you can always remove it and the sleep config option). #!/bin/sh size=$1 incr=$2 count=$3 delay=$4 checksumanswerwhile [ "$checksum" = "$answer" ] do checksum=`ssh localhost dd if=/usr/local/netscape/netscape bs=$size count=$count | ( sleep $delay ; md5sum )` answer=`dd if=/usr/local/netscape/netscape bs=$size count=$count | md5sum` echo $size $count $delay $checksum $answer size=$[ $size + $incr ] done echo CHECKSUM MISMATCH! The output of the tests is available at http://www.math.ualberta.ca/imaging/snfs/hang-on-exit.test -- John Bowman University of Alberta http://www.math.ualberta.ca/~bowman