>From what I understand, the problem is due to people's disagreement about what the "correct" behavior should be. I'm pretty sure that the following is the correct behavior from running rsh and ssh often (both fsecure and openssh).Lets say you have a stupid script that does while 1 do sleep 1 done Called foreverSleep on your remote host: rsh remotehost "foreverSleep &" Should and does hang (on Linux and Solaris at least). HOWEVER, rsh remotehost # foreverSleep & # exit does NOT hang. --- If you run openssh, like the following: ssh remotehost "foreverSleep &" Should and does hang (fsecure hangs as well). HOWEVER, ssh remotehost # foreverSleep & # exit DOES hang. (fsecure does not hang) This is where the bug is. If you run ssh with a tty and in interactive mode, if the client decides to disconnect, it disconnects cleanly (I'm not sure about what happens to the remaining processes, you will have to look at rsh code for that -- it may be SIGHUP or something, i dunno -- other posts may be clearer on this). I hope I'm not just stating the obvious, and hope this clears things up. If I'm wrong about the behaviours, let me know. I really think we should figure out what the correct behaviour should be before trying to come up with a fix. -rchit -----Original Message----- From: Michael [mailto:michael at bizsystems.com] Sent: Monday, December 10, 2001 1:23 PM To: openssh-unix-dev at mindrot.org Subject: Re: hang on exit bug under Linux> On Mon, Dec 10, 2001 at 10:50:06AM -0800, Dan Kaminsky wrote: > > Look: ssh user at host "command" needs to never, ever hang. > > wrong. > > it needs to hang. > > it needs to hang until it can be sure that 'command' does not need > any input. > > it needs to hang until it can be sure that 'command' does not > produce any output. > > it needs to hang until 'command' exits because sshd needs to tell > the exit status from 'command' to ssh. >So from a sysadmin's view point, some fool writes a piece of buggy software which hundreds of shell users decide to use and they then proceed to connect to the host via ssh and leave hundreds of "hung" sshd's in the process table, or even just one user with a cron job doing a repeated action. That sounds just great. Why on earth should anyone use openssh if they can expect it to mess up the operation of an entire system because it is BROKEN. This is a problem that will not go away. You can assert that script writers should do a better job, but they won't and that is why they write scripts. Your response requesting me to write the code is something I can't do. I only have access to Linux boxes and have no clue (and would not presume to know) what the implications are for sun, aix, hp, bsd, etc... Closing off discussion on the issue won't fix it either. I don't mean to be a pest, but I consider openssh to be an excellent tool that does a lot to promote security in general and security at our site in particular. I'd like to see it work well. It seems to have one glaring flaw this one glaring flaw that needs to be fixed to make it generally acceptable as a replacement for virtually all other remote shell access programs. Saying that rsh is broken also simply doesnt' justify why a program under active development by a very bright group of people has to be broken also. Michael Michael at Insulin-Pumpers.org
Hi, On Mon, Dec 10, 2001 at 02:09:20PM -0800, Rachit Siamwalla wrote:> Called foreverSleep on your remote host: > > rsh remotehost "foreverSleep &" > > Should and does hang (on Linux and Solaris at least). > > HOWEVER, > > rsh remotehost > # foreverSleep & > # exit > > does NOT hang.This is what I have already suggested: - if we have a pty, and the direct child goes away "just close the session", and accept data loss. Data loss can only come from background processes, who are *background* processes and shouldn't send stuff anyway - if they do, they deserve a SIGPIPE or worse. - if we have no pty, do what we do now, and block if needed. gert -- USENET is *not* the non-clickable part of WWW! //www.muc.de/~gert/ Gert Doering - Munich, Germany gert at greenie.muc.de fax: +49-89-35655025 gert.doering at physik.tu-muenchen.de
On Mon, Dec 10, 2001 at 02:09:20PM -0800, Rachit Siamwalla wrote:> >From what I understand, the problem is due to people's disagreement about what the "correct" behavior should be. I'm pretty sure that the following is the correct behavior from running rsh and ssh often (both fsecure and openssh).Some people don't get stdio. Hey.> Called foreverSleep on your remote host: > > rsh remotehost "foreverSleep &" > > Should and does hang (on Linux and Solaris at least). > > HOWEVER, > > rsh remotehost > # foreverSleep & > # exit > > does NOT hang.It should do a killpg() to send SIGHUP to the relevant processes. What if they don't want to die? Huh?> --- > > If you run openssh, like the following: > > ssh remotehost "foreverSleep &" > > Should and does hang (fsecure hangs as well). > > HOWEVER, > > ssh remotehost > # foreverSleep & > # exit > > DOES hang. (fsecure does not hang) This is where the bug is. If you run ssh with a tty and in interactive mode, if the client decides to disconnect, it disconnects cleanly (I'm not sure about what happens to the remaining processes, you will have to look at rsh code for that -- it may be SIGHUP or something, i dunno -- other posts may be clearer on this).What if "foreverSleep" needs a forwarded agent/port/x11? Huh? What if it doesn't exit if sshd sends it SIGHUP when you exit? I say: with ptys, send SIGHUP when the main process exits and/or when the client closes the session. Perhaps there should be an option like -n for the client but which applies to stdout and stderr for the faint of heart who refuse to understand '>' and '2>' and '2>&1' and so on.> I hope I'm not just stating the obvious, and hope this clears things up. If I'm wrong about the behaviours, let me know. I really think we should figure out what the correct behaviour should be before trying to come up with a fix. > > -rchitCheers, Nico -- -DISCLAIMER: an automatically appended disclaimer may follow. By posting- -to a public e-mail mailing list I hereby grant permission to distribute- -and copy this message.- Visit our website at http://www.ubswarburg.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
On Wed, Dec 12, 2001 at 09:18:05AM +1100, Damien Miller wrote:> On Tue, 11 Dec 2001, Dan Astoorian wrote: > > > On Mon, 10 Dec 2001 23:20:14 EST, Nicolas Williams writes: > > > > > > I say: with ptys, send SIGHUP when the main process exits and/or when > > > the client closes the session. > > > > Would setting the HUPCL termios cflag for the pty a) work, b) be > > portable, and c) be more appropriate than killpg()? > > What about sessions without a pty?They should always "hang", or, rather, "hang around." In any case, I no longer think that sshd should do killpg(HUP) when the session leader exits, nor, for that matter, should it set the HUPCL termios cflag for the pty. Instead I think the client should have an option to, when the sshd tells it the session exited, close the related channels and/or pass a SIGHUP to the session which the sshd would then send to the process group of the session leader.> -d >Cheers, Nico -- -DISCLAIMER: an automatically appended disclaimer may follow. By posting- -to a public e-mail mailing list I hereby grant permission to distribute- -and copy this message.- Visit our website at http://www.ubswarburg.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
> > If you run openssh, like the following: > > > > ssh remotehost "foreverSleep &" > > > > Should and does hang (fsecure hangs as well). > > > > HOWEVER, > > > > ssh remotehost > > # foreverSleep & > > # exit > > > > DOES hang. (fsecure does not hang) This is where the bug is. If you run ssh with a tty and in interactive mode, if the client decides to disconnect, it disconnects cleanly (I'm not sure about what happens to the remaining processes, you will have tolook at rsh code for that -- it may be SIGHUP or something, i dunno -- other posts may be clearer on this).>A real example would be a perl program that runs as a daemon #!/usr/bin/perl unless ($pid = fork) { unless (fork) { open(SDOUT,'>/dev/null'); open(STDERR,'>/dev/null'); open (X, 'some_process 2>&1 |'); # that generates stdio to X while (X) { # real program uses select do something } # dies exit 0; } waitpid($pid,0); exit 0; } This process will hang ssh, it should not. ....or please tell me why it should. Michael Michael at Insulin-Pumpers.org
If you like a C-Version; you might be inspired by my "daemon.c": I use it to run Shell-Scripts as "clean" daemons: /* daemon.c * $Id: daemon.c,v 1.4 2001/10/10 07:14:59 jp Exp $ * $Source: /home/u/jp/RCS/daemon.c,v $ * * 1.1 19.09.2001 jp- RCS-Checkin der ersten Version * 1.2 19.09.2001 jp- Einbau Option -l logfile, -c, -q * 1.3 19.09.2001 jp- ID als extern-String * 1.4 10.10.2001 jp- Usage auch auf stderr */ /* cc daemon.c -o daemon; strip daemon */ #include <unistd.h> #include <stdio.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <time.h> #include <sys/syslog.h> #include <sys/types.h> extern char version[]="$Id: daemon.c,v 1.4 2001/10/10 07:14:59 jp Exp $"; char * progname; int chdirRoot=0; int quiet=0; void usage(char * txt) { fprintf (stderr,"%s\n",txt); fprintf (stderr,"Usage: %s [-c] [-l /log/file] /path/to/exe arg1 arg2\n", progname); fprintf (stderr," -l /log/file does '>/log/file 2>&1'\n"); fprintf (stderr," -c does cd / (use whenever possible!)\n"); syslog (LOG_INFO | LOG_DAEMON,"%s\n",txt); syslog (LOG_INFO | LOG_DAEMON,"Usage: %s [-c] [-l /log/file] /path/to/exe arg1 arg2\n", progname); syslog (LOG_INFO | LOG_DAEMON," -l /log/file does '>/log/file 2>&1'\n"); syslog (LOG_INFO | LOG_DAEMON," -c does cd / (use whenever possible!)\n"); exit(-1); } char * timetext(void) { time_t current; struct tm * local; static char str[22]; time(¤t); /* momentane Zeit */ local = localtime(¤t); sprintf(str,"%02i.%02i.%04i,%02i:%02i:%02i", local->tm_mday, local->tm_mon+1, local->tm_year+1900, local->tm_hour, local->tm_min, local->tm_sec); return str; } void MSHdaemon(char * logfile, char * infotxt) { int rc; rc=fork(); if (-1==rc) { syslog (LOG_INFO | LOG_DAEMON,"daemon - Unable to fork()\n"); exit(-1); } if (rc>0) exit(0); /* parent should exit and return control, it's OK. */ rc=setsid(); if (-1==rc) { syslog (LOG_INFO | LOG_DAEMON,"daemon - Unable to setsid()\n"); exit(-1); } rc=fork(); if (-1==rc) { syslog (LOG_INFO | LOG_DAEMON,"daemon - Unable to fork()\n"); exit(-1); } if (rc>0) exit(0); /* parent should exit and return control, it's OK. */ if (chdirRoot == 1) { rc=chdir("/"); if (-1==rc) { syslog (LOG_INFO | LOG_DAEMON,"daemon - Unable to chdir()\n"); exit(-1); } /* umask(0); */ } if (!freopen("/dev/null", "r", stdin)) { syslog (LOG_INFO | LOG_DAEMON,"daemon - Unable to freopen(%s,...,%s): %s\n", "/dev/null","stdin",strerror (errno)); fflush(stderr); exit(-1); } if (!freopen(logfile, "a", stdout)) { syslog (LOG_INFO | LOG_DAEMON,"daemon - Unable to freopen(%s,...,%s): %s\n", logfile,"stdout",strerror (errno)); fflush(stderr); exit(-1); } if (! quiet) fprintf(stdout,"Test stdout\n"); fflush(stdout); if (!freopen(logfile, "a", stderr)) { syslog (LOG_INFO | LOG_DAEMON,"daemon - Unable to freopen(%s,...,%s): %s\n", logfile,"stderr",strerror (errno)); fflush(stderr); exit(-1); } if (! quiet) fprintf(stderr,"Test stderr\n"); if (! quiet) fprintf(stderr,"%s\n",infotxt); fprintf(stderr,"%s\n__O_u_t_p_u_t_:____\n",timetext()); fflush(stderr); if (! quiet) syslog (LOG_INFO | LOG_DAEMON, infotxt); return; /* a grandchild returns... free as a bird.. */ } int main (int argc, char * const * argv) { char prog[1024]; char infotxt[1024]; char errmsg[1024]; FILE * ftest; extern char *optarg; extern int optind; int fd; int ch; /* global int chdirRoot: cd / ? */ char logfile[256] = "/dev/null"; /* logfile: where stdout&stderr are sent to */ progname = argv[0]; while ((ch = getopt(argc, argv, "qcl:")) != -1) switch (ch) { case 'q': quiet = 1; break; case 'c': chdirRoot = 1; break; case 'l': if (argc < 2) { syslog (LOG_INFO | LOG_DAEMON,"Logfilename too long! (max 255 char.!)\n"); exit(-1); } strcpy(logfile,optarg); break; case '?': default: usage("Unbekanntes Argument"); } argc -= optind; argv += optind; if (strlen(progname)+strlen(argv[1]) > 750) { syslog (LOG_INFO | LOG_DAEMON,"ArgV too long.... Tschuess\n"); exit(-1); } if (argc < 1) { usage("Zu wenig Argumente!"); } /* Testen ob Logfile geschrieben werden kann */ ftest=fopen(logfile, "w"); if (!ftest) { syslog (LOG_INFO | LOG_DAEMON,"daemon - Unable to fopen(%s): %s\n", logfile,strerror (errno)); fflush(stderr); exit(-1); } fclose(ftest); sprintf(prog,"uid=%i prog='%s'",(int)geteuid(),argv[0]); sprintf(infotxt,"run as daemon: %s log='%s'",prog,logfile); if (! quiet) fprintf(stdout,"%s\n",infotxt); MSHdaemon(logfile,infotxt); execvp(argv[0],argv); /* Wenn wir hier ankommen, hat exec nicht funktioniert... */ sprintf(errmsg,"%s ERROR: %s",prog,strerror (errno)); syslog (LOG_ERR | LOG_DAEMON, errmsg); /* Write to log too ... */ syslog (LOG_INFO | LOG_DAEMON,"\nERROR\n%s\n",errmsg); fflush(stderr); fclose(stderr); exit(0); return 0; }
> From: Peter Stuge <stuge at cdy.org> > > The true solution is considered to be one of two things: > > 1. All daemons shall behave. (ie. close std*)Ideally, but not likely :(> 2. The user knows what he/she wants. (ie. to exit, loosing data) > > I actually want both. I want to be able to tell sloppy daemon programmers > that they should clean up their code. But I also want my users to not have > to deal with sloppy daemon programmers, unless they choose to do so. > This is tough.Could it be done at the command line? ssh -bad-daemon foohost ? We have to restart Firewall-1 and IDS probes with closed source all the time using ssh. Having the ability to do so without totally hosing things is a big plus. It's not something I can script around either.> A thought that occured in my mind tonight while thinking about this is that > the ssh client could background itself and tell the user about it, instead > of closing down and causing possible data loss. And it needs to leave std* > open, ie. not daemonize, but background. This would propagate over multiple > connections all the way back to my actual terminal. And if I choose to > close my terminal (xterm, console, whatever) the process that I started at > remotehost will be sent SIGHUP. In the perl case it would kill it, a C > program that catches the signal doesn't care and keeps running. Data will > be lost but that is out of SSH scope. > > Comments on this, anyone?Is it too hard to have a command line switch (or config option) to say "lossy/not lossy" ? It's because of this problem that I'm still stuck with a lot of firewalls still running ssh v1 :( Carl
> From: Peter Stuge <stuge at cdy.org> > > On Thu, Dec 13, 2001 at 01:34:32PM +1100, carl at bl.echidna.id.au wrote: > > > > Is it too hard to have a command line switch (or config option) to say > > "lossy/not lossy" ? It's because of this problem that I'm still stuck > > with a lot of firewalls still running ssh v1 :( > > You wouldn't need the option if the ssh client put itself in the background.so if I did ssh firewall firewall$ fwstop; fwstart exit It wouldn't hang in that instance?
> From: Peter Stuge <stuge at cdy.org> > > On Thu, Dec 13, 2001 at 01:52:09PM +1100, carl at bl.echidna.id.au wrote: > > > > > You wouldn't need the option if the ssh client put itself in the background. > > > > so if I did > > > > ssh firewall > > firewall$ fwstop; fwstart > > exit > > > > It wouldn't hang in that instance? > > The ssh client process would linger, but in the background. It would be the > equivalent of doing ^Z and then typing bg with the current OpenSSH version. > > You would be back at the prompt where you typed "ssh firewall"..Does that solve the problem? Wouldn't I then end up with, over time, a stack of these ssh client sessions? I guess I could kill -KILL them? :) Carl
Markus et al., Please pardon me if I've missed anything, but coming into this a bit late... The hanging of ssh connections with openssh where a job has been "detached" (backgrounded) is a real concern here as well. We have lots of existing ssh and rsh scripts that we are converting over to later versions of openssh, and have started running into this with startling frequency, and its operational impact is serious when compounded by regular invocations. We wind up with large numbers of waiting sshd's on one side and/or hung scripts on the client side. My understanding from reading code and the archives is that the current behaviour we are seeing (a hang) is by design - and is intended to ensure that any output from that spawned task is dutifully carried back to the originating ssh client for appropriate dispostion. The goal is to avoid losing any data in the "end game" as things die off and connections are closed. However, the prior behavior of rsh and ssh was different, and the termination of the remote connection was governed by the death of all child processes (or the only child process). It is this behaviour that people would like to see again. I agree that technically (in the sense of not losing data), the new behavior is more correct, and users should wrap programs they are trying to detach to close all FDs (easily done with a Perl program and probably shell as well). People would like to control this on the client side, but its completely server side behavior and currently there is no way for the client to influence this other than to recode their scripts (which in our case is 1000's of scripts). What we need is a way to support the old functionality but in a way that lets us migrate smoothly over time to the new behavior. I believe that a few modifcations can be made to the client and server to support both the new and old behavior, and the controlling of the default behavior. First the server side changes: 1. add an option to terminate when the primary or all child processes die. 2. add an option to set the default for this flag in the sshd_config file (default should probably be for the old behavior to be compatible with v1.3 and 1.5 clients) 3. add code to allow the client side to set this option (client should overide server) (I think this needs to be a SSH2_MSG_GLOBAL_REQUEST) Only ssh v2.0 clients will be able to set this option. Client side changes: 1. add code to send new option described above. 2. add code to set the default setting of the option in ssh_config 3. add command line processing to override default and send desired option setting to server. We need to be aware that there are many different versions of ssh client code out there, much of it beyond our control, and we need to ensure that it continues to operate as expected when we upgrade the server (backwards compatability). This means making the default server behavior accomodate the older client expectations unless it knows its got a newer client that wants the new behavior. Once a site has converted their environment, they can change the default to wait for all output FDs to close before exiting. How does this proposal sound to folks? Markus? What have I missed... -Doug- -- Douglas Kingston Director Global Unix Engineering Manager Deutsche Bank AG London 6 Bishopsgate London EC2N 4DA Work: +44-20-7545-3907 Mobile: +44-7767-616-028
for rlogin-like behaviour (i.e. a pty is allocated) it might be an option to discard data, like telnetd/rlogind do. for rsh-like behaviour data loss is not acceptable. for example, $ rsh host broken-daemon blocks too (on many platforms, if not, please show me relevant rshd code). and it should block for ssh-1.2.32.