The hang on exit has become quite an issue in my organization(Sun and HP hosts). I see this note in the changelog which indicates that there will not be a fix for this problem: 20001129 - (djm) Back out all the serverloop.c hacks. sshd will now hang again if there are background children with open fds. Also, I am aware of the workaround as noted in the FAQ. However this workaround is not ideal for sysadmins that regularly background jobs from the cmd-line. It's impractical to ask them to redirect stdin, stdout, and stderr every time. In addition, we have many jobs failing because various stop/start scripts leave open fds. Identifying and modifying all such scripts is possible but not something I want to undertake. Question: This was not an issue for us with ssh 1.2.x. Why is it an issue with Openssh? Will this ever be classified as a bug? thx, -das
On Wed, 3 Oct 2001, Schieber, Dustin wrote:> Question: This was not an issue for us with ssh 1.2.x. Why is it an > issue with Openssh? Will this ever be classified as a bug?I haven't been able to find a sane way of avoiding the hang on exit on Linux or Solaris. OpenBSD doesn't seem to have the problem - there seem to be some differing fd semantics as work. Anyone who want to try to fix the problem would be well advised to investigate these differing semantics. By 'sane', I mean a solution that won't lose data even under pathological conditions. An avoidable hang on exit is an annoyance, data loss is unforgivable. -d -- | Damien Miller <djm at mindrot.org> \ ``E-mail attachments are the poor man's | http://www.mindrot.org / distributed filesystem'' - Dan Geer
> I haven't been able to find a sane way of avoiding the hang on exit > on Linux or Solaris. OpenBSD doesn't seem to have the problem - thereOn Linux, a patch to fix this openssh bug, without data loss, has been available for some time now: http://www.math.ualberta.ca/imaging/snfs/ -- John Bowman University of Alberta --
Yes, I think it would be nice to have this as an option. I'm not that concerned about the inconvenience of a hang in an interactive session. I've very concerned about batch jobs that hang. I agree with you that it's best to just resolve the problem with the open fds on the server side. But again, this is a less than ideal solution from a sysadmin's perspective. Several of the issues I've heard of involve scripts included with the OS. For example, I've verified the hang occurs when simply restarting syslog or the snmp agents on a Sun box. (/etc/init.d/syslog stop|start) (/etc/rc3.d/S76snmpdx stop|start). Of course I can modify these scripts to utilize the workaround. But it would be nice to have another option... I've also seen this problem with various 3rd party software packages. Apparently this is a widespread problem with the open fds. Thanks! -das> -----Original Message----- > From: Nicolas Williams [mailto:Nicolas.Williams at ubsw.com] > Sent: Thursday, October 04, 2001 10:37 AM > To: Markus Friedl > Cc: Phil Howard; openssh-unix-dev at mindrot.org > Subject: Re: hang on exit - bug or no bug? > > > I think the requestor here wants this behaviour and wants it to be > optional. This seems acceptable to me (not an OpenSSH > maintainer), but: > > I also think it's silly -- if you know you'll have children on the > server side hanging on to the fildes and you don't want them to, then > re-work what you're doing on the server side so that this hack is not > necessary. > > Nico > > > On Thu, Oct 04, 2001 at 04:27:02PM +0200, Markus Friedl wrote: > > On Thu, Oct 04, 2001 at 09:01:32AM -0500, Phil Howard wrote: > > > So is this because the patch itself is undesired (flawed, > incomplete, > > > not the right solution, or not a problem to be solved), or is it > > > because OS-specific conditional compilation is undesired? > > > > it's because calling shutdown() before reading all remainig > data from > > the pipe is wrong. > > > > it will cause data loss like the older so called bug fixes did. > > > > we will include bug fixes, but only if they don't break ssh. > -- > > Visit our website at http://www.ubswarburg.com > > This message contains confidential information and is intended only > for the individual named. If you are not the named addressee you > should not disseminate, distribute or copy this e-mail. Please > notify the sender immediately by e-mail if you have received this > e-mail by mistake and delete this e-mail from your system. > > E-mail transmission cannot be guaranteed to be secure or error-free > as information could be intercepted, corrupted, lost, destroyed, > arrive late or incomplete, or contain viruses. The sender therefore > does not accept liability for any errors or omissions in the contents > of this message which arise as a result of e-mail transmission. If > verification is required please request a hard-copy version. This > message is provided for informational purposes and should not be > construed as a solicitation or offer to buy or sell any securities or > related financial instruments. >
I definately don't have the same problem with rsh, and never had it with ssh 1.2.31. It was only after upgrading to Openssh that we experienced this problem. I can't dispute what has been said about the the code in 1.2.31. I can only state what my experience has been... we did not have this hanging problem with 1.2.31. Your suggested workarounds have been discussed and in some cases already implemented on our hosts. But the perception in my organization has been that Openssh is "broken" because this was not a problem before. The FAQ provides very little documentation on this problem, it's unclear if it's considered a bug or not. This thread has been very helpful in understanding the issue, thanks for all the information. And thanks to all of the developers who have worked on this software. It is working well for us and we are generally very happy with it. thx, -das> -----Original Message----- > From: Nicolas Williams [mailto:Nicolas.Williams at ubsw.com] > Sent: Thursday, October 04, 2001 12:21 PM > To: Schieber, Dustin; Markus Friedl > Cc: Phil Howard; openssh-unix-dev at mindrot.org > Subject: Re: hang on exit - bug or no bug? > > > No, don't modify your OS' startup scripts, just run them differently. > > Look, the fds are left open, so you potentially lose data, > but you don't > care, right? So why not run those scripts like so: > > ssh -n root at somehost /etc/init.d/syslog stop > ssh -n root at somehost /etc/init.d/syslog start \>/dev/null 2\>\&1 > > ? > > You have this issue with RSH too you know. I long ago learned to deal > with this by manually redirecting stdio to /dev/null. And > yes, sometimes > you care about some bit of command's output, because you may > want to see > error msgs from /etc/init.d/syslog, say. There's a few solutions: > > - write a separate script for checking that the batch job is doing ok > - use intr (yes, Solaris doesn't have one -- write one > yourself) on the > client side to set a timeout after which to kill ssh > - write an intr-like command for the remote side that closes the fds > after a timeout and which might be used like so: > > ssh -n somehost somecommand 2\>\&1 \| iointr -t 30 > - fix the scripts > > > Nico > > > On Thu, Oct 04, 2001 at 12:11:18PM -0400, Schieber, Dustin wrote: > > Yes, I think it would be nice to have this as an option. > I'm not that > > concerned about the inconvenience of a hang in an > interactive session. > > I've very concerned about batch jobs that hang. > > > > I agree with you that it's best to just resolve the problem with the > > open fds on the server side. But again, this is a less than ideal > > solution from a sysadmin's perspective. Several of the issues I've > > heard of involve scripts included with the OS. For example, I've > > verified the hang occurs when simply restarting syslog or the snmp > > agents on a Sun box. (/etc/init.d/syslog stop|start) > > (/etc/rc3.d/S76snmpdx stop|start). Of course I can modify > these scripts > > to utilize the workaround. But it would be nice to have another > > option... > > > > I've also seen this problem with various 3rd party software > packages. > > Apparently this is a widespread problem with the open fds. > > > > Thanks! > > -das > > > > > > > > > > > > > -----Original Message----- > > > From: Nicolas Williams [mailto:Nicolas.Williams at ubsw.com] > > > Sent: Thursday, October 04, 2001 10:37 AM > > > To: Markus Friedl > > > Cc: Phil Howard; openssh-unix-dev at mindrot.org > > > Subject: Re: hang on exit - bug or no bug? > > > > > > > > > I think the requestor here wants this behaviour and wants it to be > > > optional. This seems acceptable to me (not an OpenSSH > > > maintainer), but: > > > > > > I also think it's silly -- if you know you'll have children on the > > > server side hanging on to the fildes and you don't want > them to, then > > > re-work what you're doing on the server side so that this > hack is not > > > necessary. > > > > > > Nico > > > > > > > > > On Thu, Oct 04, 2001 at 04:27:02PM +0200, Markus Friedl wrote: > > > > On Thu, Oct 04, 2001 at 09:01:32AM -0500, Phil Howard wrote: > > > > > So is this because the patch itself is undesired (flawed, > > > incomplete, > > > > > not the right solution, or not a problem to be > solved), or is it > > > > > because OS-specific conditional compilation is undesired? > > > > > > > > it's because calling shutdown() before reading all remainig > > > data from > > > > the pipe is wrong. > > > > > > > > it will cause data loss like the older so called bug fixes did. > > > > > > > > we will include bug fixes, but only if they don't break ssh. > > > -- > > > > > > Visit our website at http://www.ubswarburg.com > > > > > > This message contains confidential information and is > intended only > > > for the individual named. If you are not the named addressee you > > > should not disseminate, distribute or copy this e-mail. Please > > > notify the sender immediately by e-mail if you have received this > > > e-mail by mistake and delete this e-mail from your system. > > > > > > E-mail transmission cannot be guaranteed to be secure or > error-free > > > as information could be intercepted, corrupted, lost, destroyed, > > > arrive late or incomplete, or contain viruses. The > sender therefore > > > does not accept liability for any errors or omissions in > the contents > > > of this message which arise as a result of e-mail > transmission. If > > > verification is required please request a hard-copy > version. This > > > message is provided for informational purposes and should not be > > > construed as a solicitation or offer to buy or sell any > securities or > > > related financial instruments. > > > > -- > -DISCLAIMER: an automatically appended disclaimer may follow. > By posting- > -to a public e-mail mailing list I hereby grant permission to > distribute- > -and copy this message.- > > Visit our website at http://www.ubswarburg.com > > This message contains confidential information and is intended only > for the individual named. If you are not the named addressee you > should not disseminate, distribute or copy this e-mail. Please > notify the sender immediately by e-mail if you have received this > e-mail by mistake and delete this e-mail from your system. > > E-mail transmission cannot be guaranteed to be secure or error-free > as information could be intercepted, corrupted, lost, destroyed, > arrive late or incomplete, or contain viruses. The sender therefore > does not accept liability for any errors or omissions in the contents > of this message which arise as a result of e-mail transmission. If > verification is required please request a hard-copy version. This > message is provided for informational purposes and should not be > construed as a solicitation or offer to buy or sell any securities or > related financial instruments. >
Yes, you will have this problem with rsh, but *not* rlogin. This is a key difference. So you have the same problem in the case of batch processing (where rsh is invoked) In this case, the hang case can be more than just annoying; it can break your code). In the case of the interactive login, the behavior differs (where the hang cases is merely just annoying). Yes, the applications are broken if they fork and continue without closing out all file descriptors. However, unfortunately a *lot* of applications are written that way. 90% of the time when I leave on a long ssh session open, and do a bunch of random things, I cannot log out without ssh hanging. -rchit -----Original Message----- From: Nicolas Williams [mailto:Nicolas.Williams at ubsw.com] Sent: Thursday, October 04, 2001 9:21 AM To: Schieber, Dustin; Markus Friedl Cc: Phil Howard; openssh-unix-dev at mindrot.org Subject: Re: hang on exit - bug or no bug? No, don't modify your OS' startup scripts, just run them differently. Look, the fds are left open, so you potentially lose data, but you don't care, right? So why not run those scripts like so: ssh -n root at somehost /etc/init.d/syslog stop ssh -n root at somehost /etc/init.d/syslog start \>/dev/null 2\>\&1 ? You have this issue with RSH too you know. I long ago learned to deal with this by manually redirecting stdio to /dev/null. And yes, sometimes you care about some bit of command's output, because you may want to see error msgs from /etc/init.d/syslog, say. There's a few solutions: - write a separate script for checking that the batch job is doing ok - use intr (yes, Solaris doesn't have one -- write one yourself) on the client side to set a timeout after which to kill ssh - write an intr-like command for the remote side that closes the fds after a timeout and which might be used like so: ssh -n somehost somecommand 2\>\&1 \| iointr -t 30 - fix the scripts Nico On Thu, Oct 04, 2001 at 12:11:18PM -0400, Schieber, Dustin wrote:> Yes, I think it would be nice to have this as an option. I'm not that > concerned about the inconvenience of a hang in an interactive session. > I've very concerned about batch jobs that hang. > > I agree with you that it's best to just resolve the problem with the > open fds on the server side. But again, this is a less than ideal > solution from a sysadmin's perspective. Several of the issues I've > heard of involve scripts included with the OS. For example, I've > verified the hang occurs when simply restarting syslog or the snmp > agents on a Sun box. (/etc/init.d/syslog stop|start) > (/etc/rc3.d/S76snmpdx stop|start). Of course I can modify these scripts > to utilize the workaround. But it would be nice to have another > option... > > I've also seen this problem with various 3rd party software packages. > Apparently this is a widespread problem with the open fds. > > Thanks! > -das > > > > > > > -----Original Message----- > > From: Nicolas Williams [mailto:Nicolas.Williams at ubsw.com] > > Sent: Thursday, October 04, 2001 10:37 AM > > To: Markus Friedl > > Cc: Phil Howard; openssh-unix-dev at mindrot.org > > Subject: Re: hang on exit - bug or no bug? > > > > > > I think the requestor here wants this behaviour and wants it to be > > optional. This seems acceptable to me (not an OpenSSH > > maintainer), but: > > > > I also think it's silly -- if you know you'll have children on the > > server side hanging on to the fildes and you don't want them to, then > > re-work what you're doing on the server side so that this hack is not > > necessary. > > > > Nico > > > > > > On Thu, Oct 04, 2001 at 04:27:02PM +0200, Markus Friedl wrote: > > > On Thu, Oct 04, 2001 at 09:01:32AM -0500, Phil Howard wrote: > > > > So is this because the patch itself is undesired (flawed, > > incomplete, > > > > not the right solution, or not a problem to be solved), or is it > > > > because OS-specific conditional compilation is undesired? > > > > > > it's because calling shutdown() before reading all remainig > > data from > > > the pipe is wrong. > > > > > > it will cause data loss like the older so called bug fixes did. > > > > > > we will include bug fixes, but only if they don't break ssh. > > -- > > > > Visit our website at http://www.ubswarburg.com > > > > This message contains confidential information and is intended only > > for the individual named. If you are not the named addressee you > > should not disseminate, distribute or copy this e-mail. Please > > notify the sender immediately by e-mail if you have received this > > e-mail by mistake and delete this e-mail from your system. > > > > E-mail transmission cannot be guaranteed to be secure or error-free > > as information could be intercepted, corrupted, lost, destroyed, > > arrive late or incomplete, or contain viruses. The sender therefore > > does not accept liability for any errors or omissions in the contents > > of this message which arise as a result of e-mail transmission. If > > verification is required please request a hard-copy version. This > > message is provided for informational purposes and should not be > > construed as a solicitation or offer to buy or sell any securities or > > related financial instruments. > >-- -DISCLAIMER: an automatically appended disclaimer may follow. By posting- -to a public e-mail mailing list I hereby grant permission to distribute- -and copy this message.- Visit our website at http://www.ubswarburg.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.