http://bugzilla.mindrot.org/show_bug.cgi?id=52 imaging at math.ualberta.ca changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #801 is|0 |1 obsolete| | ------- Comment #18 from imaging at math.ualberta.ca 2006-02-22 16:24 ------- Created an attachment (id=1075) --> (http://bugzilla.mindrot.org/attachment.cgi?id=1075&action=view) Up-to-date hang-on-exit patch This is an up-to-date patch (based on Markus Friedl's suggestion) to fix the notorious hang-on-exit bug (which happens only with the portable version of OpenSSH). No data loss occurs with this patch: it does not break ssh or scp. The latest version of this patch will continue to be posted at the URL below (as it has been for many years now), until the openssh developers finally get around to applying it to their sources: http://www.math.ualberta.ca/imaging/snfs/openssh.html ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
http://bugzilla.mindrot.org/show_bug.cgi?id=52 djm at mindrot.org changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1075| |ok- Flag| | ------- Comment #19 from djm at mindrot.org 2006-02-22 20:34 ------- (From update of attachment 1075) We won't be applying this patch because it wrong, and something we tried quite a few years ago. It is essentially the same as the old, broken hang-on-exit fix that I mistakenly added around ~2.2.x except that it papers over the obvious dataloss race by only chopping off the read fd when it is a tty. It will still lose data on ttys. The following is from a patched sshd, demonstrating the dataloss. The root of this is a race condition: the SIGCHLD can arrive *before* the read pipe is fully empty.> [djm at baragon djm]$ (set -e ; while [ 1 ] ; do > ssh -qttp2222 linux-qemu "dd if=/dev/zero bs=256k count=1" | > wc -c ; done) > 262142 > 262093 > 262096 > 262125 > 262096 > 260056 > 262116 > 262124 > 212974 > ^C[djm at baragon djm]$I recommend that packegers of OpenSSH do *not* apply this patch. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
http://bugzilla.mindrot.org/show_bug.cgi?id=52 ------- Comment #20 from imaging at math.ualberta.ca 2006-02-23 08:44 -------> It is essentially the same as the old, broken hang-on-exit fix that I > mistakenly added around ~2.2.x except that it papers over the obvious dataloss > race by only chopping off the read fd when it is a tty. It will still lose data > on ttys.That is in fact is the correct behaviour. Please check out how rsh and the commercial ssh work: if the user types "exit" in a shell, all further I/O, both reads and writes, are ignored. The rationale for this long-standing convention is: if you want more I/O, don't type "exit"! One sensible use of this feature is to start up a program in the background (which tees its output both to a log file and stdout) and then exit the shell after viewing the initial output. But there are many other uses as well. The current behaviour of openssh is annoying and unacceptable as a drop-in replacement for rsh.> (set -e ; while [ 1 ] ; do > ssh -qttp2222 linux-qemu "dd if=/dev/zero bs=256k count=1" | > wc -c ; done)Because you tried to be clever and pretend to be an interactive tty session, even though you are not. You want to say this instead: (set -e ; while [ 1 ] ; do ssh -qp2222 linux-qemu "dd if=/dev/zero bs=256k count=1" | wc -c ; done) There is no data loss with this command (or even if a single -t is given, because ssh sensibly overrides this). So the question is, why do you need to pretend to require a tty? If there really is some sensible reason for forcing a tty allocation in this context, then just pass a flag from ssh to sshd to indicate that you have done so. It's a trivial modification, but I don't yet see the need.> I recommend that packegers of OpenSSH do *not* apply this patch.I highly recommend it; it's about time that this silly bug be fixed once and for all. Let's move forward now... Incidentally, to avoid confusion, please disregard the claim in Comment #16 that an earlier version of my hang-on-exit patch simply added an exit delay. The exit delay business was a separate option that was added at the request of another user. It has nothing at all to do with the hang-on-exit bug. Since no else seems to have been interested in that feature, I stopped maintaining that exit delay patch some time ago (although it would be easy for someone to port my old openssh-sleep.patch to the current version). ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
http://bugzilla.mindrot.org/show_bug.cgi?id=52 ------- Comment #21 from djm at mindrot.org 2006-02-23 12:12 ------- (In reply to comment #20)> That is in fact is the correct behaviour. Please check out how rsh and the > commercial ssh work: if the user types "exit" in a shell, all further I/O, > both reads and writes, are ignored. The rationale for this long-standing > convention is: if you want more I/O, don't type "exit"!The example test in Comment #19 *doesn't* do any IO after "dd" exits, but there is still data in the pipe/socket buffer that gets lost by your patch. The same applies to any interactive program that is producing data around the time it exits.> > (set -e ; while [ 1 ] ; do > > ssh -qttp2222 linux-qemu "dd if=/dev/zero bs=256k count=1" | > > wc -c ; done) > > Because you tried to be clever and pretend to be an interactive tty > session, even though you are not.You are completely missing the point. This is simple test to demonstrate that your patch loses data on ttys. A real-world mainfestation of this problem with your patch could be an interactive ncurses app's endwin() cleanups being truncated, resulting in a corrupted terminal state.> I highly recommend it; it's about time that this silly bug be fixed once > and for all. Let's move forward now...I'd love to see this bug fixed, but your patch introduces new problems. Just because you haven't noticed, or consider them relevant to you them doesn't mean that they don't exist. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
http://bugzilla.mindrot.org/show_bug.cgi?id=52 ------- Comment #22 from sam_bravard at yahoo.com 2006-03-11 14:11 ------- Letting this bug languish for years without a solution isn't helping us (the users). We care about real-world use, and in real world use a hanging process that I've asked to exit is an administration burden. Auto-deploy/maintenance scripts can't be run automatically without writing some sort of shootdown watchdog, etc. This patch solves a real need. If you don't like the way this patch works, then by all means create an alternate solution, but let's get the problem fixed already. If you must, create an option to make this behavior selectable or better yet an environment variable. Something that I can set and just leave on all the time. It's not one author vs. openssh, there are easily thousands of ssh users that want this fixed (hell I've just talked with 8 of them and everyone is shocked to hear this is even a debate). We'll accept loosing the last few bytes if we can guarantee a disconnect. Please help us. On behalf of the lowly users, Sam (In reply to comment #21)> (In reply to comment #20) > > That is in fact is the correct behaviour. Please check out how rsh and the > > commercial ssh work: if the user types "exit" in a shell, all further I/O, > > both reads and writes, are ignored. The rationale for this long-standing > > convention is: if you want more I/O, don't type "exit"! > > The example test in Comment #19 *doesn't* do any IO after "dd" exits, but there > is still data in the pipe/socket buffer that gets lost by your patch. The same > applies to any interactive program that is producing data around the time it > exits. > > > > (set -e ; while [ 1 ] ; do > > > ssh -qttp2222 linux-qemu "dd if=/dev/zero bs=256k count=1" | > > > wc -c ; done) > > > > Because you tried to be clever and pretend to be an interactive tty > > session, even though you are not. > > You are completely missing the point. This is simple test to demonstrate that > your patch loses data on ttys. > > A real-world mainfestation of this problem with your patch could be an > interactive ncurses app's endwin() cleanups being truncated, resulting in a > corrupted terminal state. > > > I highly recommend it; it's about time that this silly bug be fixed once > > and for all. Let's move forward now... > > I'd love to see this bug fixed, but your patch introduces new problems. Just > because you haven't noticed, or consider them relevant to you them doesn't mean > that they don't exist. >------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
http://bugzilla.mindrot.org/show_bug.cgi?id=52 tsi at ualberta.ca changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #667 is|0 |1 obsolete| | ------- Comment #23 from tsi at ualberta.ca 2006-03-13 14:39 ------- Created an attachment (id=1098) --> (http://bugzilla.mindrot.org/attachment.cgi?id=1098&action=view) Updated patch against 4.3p2 Well, one effective way for a user to become "lowly" is to believe one is in the position to dictate. Often, in such cases, there's money involved. Attached is a rework of my prior fix, for 4.3p2 and to fix a few bugs I've run into since then, including the race condition mentioned earlier. This works with all protocol versions and there is no data loss. This version of the patch turns out to be quite small but you might wish to recode its overloading of detach_close. The intent here is to continue reading from the tty, after the child shell has terminated, until there is an error. On OpenBSD (and I suspect other CSRG-based variants), that error will be EIO, whereas on Linux, IRIX, SunOS, etc., it will be EAGAIN. That's because SysV variants do not "close" the tty when its controlling process exits. The reason the hang doesn't occur on OpenBSD is that the tty's change of status is reported through select(2), but isn't on SysV variants. Under compat20, this only affects the channel associated with the tty. Thus, if other channels are open (forwarding, etc.), the connection with the client will remain open. Damien, that `dd` command should redirect its stderr to /dev/null so you will always get 262144. A nitpick to be sure, but your posting was just the thing I needed to duplicate the problem reliably enough to debug it. Thanks. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.