Nicolas Williams
2002-Aug-08  04:27 UTC
The complete answer (was Re: so-called-hang-on-exit)
Ok, so I think I have a complete explanation for the difference between the *BSD behaviour and the Linux/Solaris behaviour. Well, almost complete :) Pull out your trusty copies of "The Design and Implementation of the 4.4BSD Operating System" as well as "Unix Internals: The New Frontiers". Specifically, pages 111-112 and 344 of the former and page 108 of the latter. It comes down to this: - The 4.4BSD tty and pty drivers send SIGHUP followed by SIGCONT (for stopped processes) to all orphaned process groups with a given tty/pty association when the session leader exits (TDI44BSDOS states that POSIX and 4.4BSD do this) - and any open file descriptors referring to the tty/pty in any processes that choose to continue running are revoked. - Whereas SVR4 doesn't do any of this and relies on the session leader to do its part and HUP/CONT its process groups. This part is not too clear because Uresh Vahalia mentions this very much in passing on page 108 of "Unix Internals." It is unclear whether closing a pty master causes the pty slave driver to take any action, whether sending singals to any processes or revoking any open file descriptors, when the session process is alive or dead at the time that the master is closed. This can be determined experimentally. Ideally closing the pty master after the pty slave's session leader has exited will cause the pty slave driver to revoke open fildes referring to it *and* to send HUP/CONT to remaining process groups with that pty slave association. My tests indicate that closing the master pty, on Solaris, does not cause the slave pty open fildeses to be revoked and not signal is sent, so orphaned background process groups continue to run and clutter up the process table *and*, apparently they continue to consume a pty which cannot be reused until said processes exit or dissasociate from their pty. What about Linux? My own tests today indicate that the Korn Shell, on Solaris, is smart enough to send HUP/CONT to all of its process groups before exiting, whereas the Solaris C-Shell is NOT. This points to a bug, or, rather, the *lack of a feature* in the Solaris C-Shell. I don't care about the C-Shell, so I won't file a bug report / RFE with Sun - if you care about the C-Shell then you should, and check SunSolve as it may be that a suitable patch already exists for all I know. I don't know what is the exact Linux behaviour, but I rather suspect that it follows the SVR4 approach. Comments elsewhere in this thread indicate that Bash 2.x is configurable with respect to its behaviour on exit, through the 'huponexit' option. I advise you all to read the Bash man page - search for 'huponexit'. Which behaviour is best? To leave it to the shell to HUP/CONT its process groups before exiting? Or to leave it to the tty/pty driver to do the same? Each has its drawbacks, for example: the former can lead to undesirable orphaned process groups cluttering the process table if a shell fails to implement the SVR4 strategy, whereas the latter makes it impossible to implement the Bash 'disown' built-in command feature. And making the driver responsible for sending the signals may imply heavier structures and/or synchronization in the kernel. ****************************** So, Markus, Ben, et. al., I recommend that you close all bugs related to this hang-on-exit issue and document the problem as being buggy or insufficiently featured shells on Linux/Solaris. Including a patch to close the pty master when the session leader exits, on some platforms, may be probably a good idea, but it's not absolutely necessary - what is absolutely necessary is that the shells on Linux/Solaris know to send HUP/CONT to their process groups before exiting. Jani, Frank, et. al., make sure that your shell is configured correctly and/or that you use a shell that correctly implements the SVR4 behaviour and/or that you get or write patches for any broken shells. Merely forcing the sshd to close the pty master might not be enough, but it would be good if you could strace/truss an entire session with /bin/csh as the session leader and a patched sshd that closes the pty master - I would like to know what happens to backgrounded process groups in such a case (see above). Cheers, Nico -- Visit our website at http://www.ubswarburg.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
On Thu, Aug 08, 2002 at 12:27:37AM -0400, Nicolas Williams wrote:> - Whereas SVR4 doesn't do any of this and relies on the session leader > to do its part and HUP/CONT its process groups. This part is not too > clear because Uresh Vahalia mentions this very much in passing on > page 108 of "Unix Internals." > > It is unclear whether closing a pty master causes the pty slave > driver to take any action, whether sending singals to any processes > or revoking any open file descriptors, when the session process is > alive or dead at the time that the master is closed. This can be > determined experimentally. Ideally closing the pty master after the > pty slave's session leader has exited will cause the pty slave driver > to revoke open fildes referring to it *and* to send HUP/CONT to > remaining process groups with that pty slave association. > > My tests indicate that closing the master pty, on Solaris, does not > cause the slave pty open fildeses to be revoked and not signal isBut it must, otherwise changes in ownership would give access that shouldn't be there to the bg process group, No?> sent, so orphaned background process groups continue to run and > clutter up the process table *and*, apparently they continue to > consume a pty which cannot be reused until said processes exit or > dissasociate from their pty. > > What about Linux?Interesting. If you look at linux/drivers/char/tty_io.c:disassociate_ctty() you can see that SIGHUP and SIGCONT are sent. disassociate_ctty() is called from linux/kernel/exit.c if the exiting process is the session leader. Yet Linux has the problem.> So, Markus, Ben, et. al., I recommend that you close all bugs related > to this hang-on-exit issue and document the problem as being buggy or > insufficiently featured shells on Linux/Solaris. Including a patch to > close the pty master when the session leader exits, on some platforms, > may be probably a good idea, but it's not absolutely necessary - what > is absolutely necessary is that the shells on Linux/Solaris know to > send HUP/CONT to their process groups before exiting.Ack! I would say it is needed. There are *plenty* of other implemented workarounds for wierd behavior on xyz given platform. It's too painful to have to set your shell vars correctly, etc. What if you use a shell that doesn't support that kind of thing? etc. The patch does no harm. There's no reason not to include it.> Jani, Frank, et. al., make sure that your shell is configured correctly > and/or that you use a shell that correctly implements the SVR4 behaviourI've known about huponexit forever. (RedHat documents this as a workaround.) This is not an acceptable workaround for me. Now, this isn't really a problem b/c I have to maintain a local version of openssh anyway, but I always prefer to have minimal changes, and other folks want/need this also! Thanks for investing time in this, Nico. /fc
On Wed, Aug 07, 2002 at 10:12:51PM -0700, Frank Cusack wrote:> Interesting. If you look at linux/drivers/char/tty_io.c:disassociate_ctty() > you can see that SIGHUP and SIGCONT are sent. disassociate_ctty() is called > from linux/kernel/exit.c if the exiting process is the session leader. > Yet Linux has the problem.uhhh from the comment for disassociate_ctty(): * (1) Sends a SIGHUP and SIGCONT to the foreground process group which jibes with Goodheart & Cox (although they are not 100% clear on this). Then again, from kernel/exit.c: /* * Determine if a process group is "orphaned", according to the POSIX * definition in 2.2.2.52. Orphaned process groups are not to be affected * by terminal-generated stop signals. Newly orphaned process groups are * to receive a SIGHUP and a SIGCONT. /fc