Yes, this is a patch against an older version of OpenSSH with other stuff anyways, BUT, it's so TRIVIAL(*), that you can see how it would apply to newer versions (which I've not tried). Here's the gist: server_loop2() has a race condition with respect to reception of SIGCHLD and checking/setting child_terminated. This patch does two things: wait_until_can_do_something() adds a 1 second timeout to select() IF AND ONLY IF (!channel_still_open) AND, server_loop2() breaks out of its loop when there are no sessions left. Blocking SIGCHLD before select()ing would not fix the problem, nor would that be very portable. So, summary of changes: - session.h Added prototype for session_still_used() boolean function. - session.c Added session_still_used() boolean function. - serverloop.c Added this bit of code to wait_until_can_do_something(): if (!channel_still_open()) max_time_milliseconds = 1000; before select()ing. Added this bit of code to server_loop2(): if (child_terminated) { while ((pid = waitpid(-1, &status, WNOHANG)) > 0) session_close_by_pid(pid, status); - child_terminated = 0; + if (session_still_used()) + child_terminated = 0; + if (child_terminated && !channel_still_open()) + break; so that child_terminated is not reset after handling SIGCHLD *UNLESS* there are no sessions left open. Also, for server_loop(), perhaps it too should have a check to break out of its loop if the child is terminated and there are no channels still open. (*) I hope :) Cheers, Nico Index: 2_9_p2_w_gss_krb5_named_keys.10/session.h --- 2_9_p2_w_gss_krb5_named_keys.10/session.h Tue, 26 Jun 2001 16:27:13 -0400 willian (OpenSSH/i/13_session.h 1.2 644) +++ 2_9_p2_w_gss_krb5_named_keys.10(w)/session.h Thu, 25 Oct 2001 14:58:32 -0400 willian (OpenSSH/i/13_session.h 1.2 644) @@ -28,6 +28,7 @@ void do_authenticated(Authctxt *ac); +int session_still_used(); int session_open(int id); void session_input_channel_req(int id, void *arg); void session_close_by_pid(pid_t pid, int status); Index: 2_9_p2_w_gss_krb5_named_keys.10/session.c --- 2_9_p2_w_gss_krb5_named_keys.10/session.c Tue, 26 Jun 2001 16:27:13 -0400 willian (OpenSSH/i/14_session.c 1.3 644) +++ 2_9_p2_w_gss_krb5_named_keys.10(w)/session.c Thu, 25 Oct 2001 15:04:21 -0400 willian (OpenSSH/i/14_session.c 1.3 644) @@ -1533,18 +1533,18 @@ exit(1); } +static int did_init_sessions = 0; Session * session_new(void) { int i; - static int did_init = 0; - if (!did_init) { + if (!did_init_sessions) { debug("session_new: init"); for(i = 0; i < MAX_SESSIONS; i++) { sessions[i].used = 0; sessions[i].self = i; } - did_init = 1; + did_init_sessions = 1; } for(i = 0; i < MAX_SESSIONS; i++) { Session *s = &sessions[i]; @@ -1622,6 +1622,27 @@ error("session_by_pid: unknown pid %d", pid); session_dump(); return NULL; +} + +int +session_still_used() +{ + int i; + if (!did_init_sessions) { + debug("session_new: init"); + for(i = 0; i < MAX_SESSIONS; i++) { + sessions[i].used = 0; + sessions[i].self = i; + } + did_init_sessions = 1; + } + debug("session_still_used"); + for(i = 0; i < MAX_SESSIONS; i++) { + Session *s = &sessions[i]; + if (s->used) + return 1; + } + return 0; } int Index: 2_9_p2_w_gss_krb5_named_keys.10/serverloop.c --- 2_9_p2_w_gss_krb5_named_keys.10/serverloop.c Thu, 03 May 2001 16:12:13 -0400 jd (OpenSSH/i/16_serverloop 1.1 644) +++ 2_9_p2_w_gss_krb5_named_keys.10(w)/serverloop.c Thu, 25 Oct 2001 15:11:49 -0400 willian (OpenSSH/i/16_serverloop 1.1 644) @@ -259,18 +259,25 @@ if (max_time_milliseconds == 0 || client_alive_scheduled) max_time_milliseconds = 100; + if (!channel_still_open()) + max_time_milliseconds = 1000; + if (max_time_milliseconds == 0) tvp = NULL; else { tv.tv_sec = max_time_milliseconds / 1000; tv.tv_usec = 1000 * (max_time_milliseconds % 1000); tvp = &tv; + debug3("select timeout is: tv.tv_sec=%d, tv.tv_usec=%d", (int) max_time_milliseconds / + 1000, (int) 1000 * (max_time_milliseconds % 1000)); } if (tvp!=NULL) debug3("tvp!=NULL kid %d mili %d", child_terminated, max_time_milliseconds); /* Wait for something to happen, or the timeout to expire. */ ret = select((*maxfdp)+1, *readsetp, *writesetp, NULL, tvp); + debug("select() returned %d, child_terminated=%d, channel_still_open() returned %d", ret, + child_terminated, channel_still_open()); if (ret == -1) { if (errno != EINTR) @@ -590,6 +597,8 @@ /* Sleep in select() until we can do something. */ wait_until_can_do_something(&readset, &writeset, &max_fd, max_time_milliseconds); + debug("wait_until_can_do_something returned; child_terminated=%d, channel_still_open=%d", + child_terminated, channel_still_open()); /* Process any channel events. */ channel_after_select(readset, writeset); @@ -599,6 +608,11 @@ /* Process output to the client and to program stdin. */ process_output(writeset); + + /* Don't know if this is needed -- if it is, uncomment + if (!channel_still_open() && child_terminated) + break; + */ } if (readset) xfree(readset); @@ -721,7 +735,10 @@ if (child_terminated) { while ((pid = waitpid(-1, &status, WNOHANG)) > 0) session_close_by_pid(pid, status); - child_terminated = 0; + if (session_still_used()) + child_terminated = 0; + if (child_terminated && !channel_still_open()) + break; } if (!rekeying) channel_after_select(readset, writeset); -- -DISCLAIMER: an automatically appended disclaimer may follow. By posting- -to a public e-mail mailing list I hereby grant permission to distribute- -and copy this message.- Visit our website at http://www.ubswarburg.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
I forgot to mention the behaviour that this patch fixes: hang-on-exit for SSHv2 sessions. See the other "SIGCHLD race condition?" thread for more details. Cheers, Nico On Thu, Oct 25, 2001 at 03:25:22PM -0400, Nicolas Williams wrote:> Yes, this is a patch against an older version of OpenSSH with other > stuff anyways, BUT, it's so TRIVIAL(*), that you can see how it would > apply to newer versions (which I've not tried). > > Here's the gist: server_loop2() has a race condition with respect to > reception of SIGCHLD and checking/setting child_terminated. This patch > does two things: wait_until_can_do_something() adds a 1 second timeout > to select() IF AND ONLY IF (!channel_still_open) AND, server_loop2() > breaks out of its loop when there are no sessions left. > > Blocking SIGCHLD before select()ing would not fix the problem, nor would > that be very portable. > > So, summary of changes: > > - session.h > > Added prototype for session_still_used() boolean function. > > - session.c > > Added session_still_used() boolean function. > > - serverloop.c > > Added this bit of code to wait_until_can_do_something(): > > if (!channel_still_open()) > max_time_milliseconds = 1000; > > before select()ing. > > Added this bit of code to server_loop2(): > > if (child_terminated) { > while ((pid = waitpid(-1, &status, WNOHANG)) > 0) > session_close_by_pid(pid, status); > - child_terminated = 0; > + if (session_still_used()) > + child_terminated = 0; > + if (child_terminated && !channel_still_open()) > + break; > > so that child_terminated is not reset after handling SIGCHLD *UNLESS* > there are no sessions left open. > > Also, for server_loop(), perhaps it too should have a check to break > out of its loop if the child is terminated and there are no channels > still open. > > (*) I hope :) > > Cheers, > > Nico > > > Index: 2_9_p2_w_gss_krb5_named_keys.10/session.h > --- 2_9_p2_w_gss_krb5_named_keys.10/session.h Tue, 26 Jun 2001 16:27:13 -0400 willian (OpenSSH/i/13_session.h 1.2 644) > +++ 2_9_p2_w_gss_krb5_named_keys.10(w)/session.h Thu, 25 Oct 2001 14:58:32 -0400 willian (OpenSSH/i/13_session.h 1.2 644) > @@ -28,6 +28,7 @@ > > void do_authenticated(Authctxt *ac); > > +int session_still_used(); > int session_open(int id); > void session_input_channel_req(int id, void *arg); > void session_close_by_pid(pid_t pid, int status); > Index: 2_9_p2_w_gss_krb5_named_keys.10/session.c > --- 2_9_p2_w_gss_krb5_named_keys.10/session.c Tue, 26 Jun 2001 16:27:13 -0400 willian (OpenSSH/i/14_session.c 1.3 644) > +++ 2_9_p2_w_gss_krb5_named_keys.10(w)/session.c Thu, 25 Oct 2001 15:04:21 -0400 willian (OpenSSH/i/14_session.c 1.3 644) > @@ -1533,18 +1533,18 @@ > exit(1); > } > > +static int did_init_sessions = 0; > Session * > session_new(void) > { > int i; > - static int did_init = 0; > - if (!did_init) { > + if (!did_init_sessions) { > debug("session_new: init"); > for(i = 0; i < MAX_SESSIONS; i++) { > sessions[i].used = 0; > sessions[i].self = i; > } > - did_init = 1; > + did_init_sessions = 1; > } > for(i = 0; i < MAX_SESSIONS; i++) { > Session *s = &sessions[i]; > @@ -1622,6 +1622,27 @@ > error("session_by_pid: unknown pid %d", pid); > session_dump(); > return NULL; > +} > + > +int > +session_still_used() > +{ > + int i; > + if (!did_init_sessions) { > + debug("session_new: init"); > + for(i = 0; i < MAX_SESSIONS; i++) { > + sessions[i].used = 0; > + sessions[i].self = i; > + } > + did_init_sessions = 1; > + } > + debug("session_still_used"); > + for(i = 0; i < MAX_SESSIONS; i++) { > + Session *s = &sessions[i]; > + if (s->used) > + return 1; > + } > + return 0; > } > > int > Index: 2_9_p2_w_gss_krb5_named_keys.10/serverloop.c > --- 2_9_p2_w_gss_krb5_named_keys.10/serverloop.c Thu, 03 May 2001 16:12:13 -0400 jd (OpenSSH/i/16_serverloop 1.1 644) > +++ 2_9_p2_w_gss_krb5_named_keys.10(w)/serverloop.c Thu, 25 Oct 2001 15:11:49 -0400 willian (OpenSSH/i/16_serverloop 1.1 644) > @@ -259,18 +259,25 @@ > if (max_time_milliseconds == 0 || client_alive_scheduled) > max_time_milliseconds = 100; > > + if (!channel_still_open()) > + max_time_milliseconds = 1000; > + > if (max_time_milliseconds == 0) > tvp = NULL; > else { > tv.tv_sec = max_time_milliseconds / 1000; > tv.tv_usec = 1000 * (max_time_milliseconds % 1000); > tvp = &tv; > + debug3("select timeout is: tv.tv_sec=%d, tv.tv_usec=%d", (int) max_time_milliseconds / > + 1000, (int) 1000 * (max_time_milliseconds % 1000)); > } > if (tvp!=NULL) > debug3("tvp!=NULL kid %d mili %d", child_terminated, max_time_milliseconds); > > /* Wait for something to happen, or the timeout to expire. */ > ret = select((*maxfdp)+1, *readsetp, *writesetp, NULL, tvp); > + debug("select() returned %d, child_terminated=%d, channel_still_open() returned %d", ret, > + child_terminated, channel_still_open()); > > if (ret == -1) { > if (errno != EINTR) > @@ -590,6 +597,8 @@ > /* Sleep in select() until we can do something. */ > wait_until_can_do_something(&readset, &writeset, &max_fd, > max_time_milliseconds); > + debug("wait_until_can_do_something returned; child_terminated=%d, channel_still_open=%d", > + child_terminated, channel_still_open()); > > /* Process any channel events. */ > channel_after_select(readset, writeset); > @@ -599,6 +608,11 @@ > > /* Process output to the client and to program stdin. */ > process_output(writeset); > + > + /* Don't know if this is needed -- if it is, uncomment > + if (!channel_still_open() && child_terminated) > + break; > + */ > } > if (readset) > xfree(readset); > @@ -721,7 +735,10 @@ > if (child_terminated) { > while ((pid = waitpid(-1, &status, WNOHANG)) > 0) > session_close_by_pid(pid, status); > - child_terminated = 0; > + if (session_still_used()) > + child_terminated = 0; > + if (child_terminated && !channel_still_open()) > + break; > } > if (!rekeying) > channel_after_select(readset, writeset); >-- -DISCLAIMER: an automatically appended disclaimer may follow. By posting- -to a public e-mail mailing list I hereby grant permission to distribute- -and copy this message.- Visit our website at http://www.ubswarburg.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments.
On Thu, Oct 25, 2001 at 03:25:22PM -0400, Nicolas Williams wrote:> if (!channel_still_open()) > max_time_milliseconds = 1000;there are no channels when a client authenticates. just try ssh -N> > Added this bit of code to server_loop2(): > > if (child_terminated) { > while ((pid = waitpid(-1, &status, WNOHANG)) > 0) > session_close_by_pid(pid, status); > - child_terminated = 0; > + if (session_still_used()) > + child_terminated = 0; > + if (child_terminated && !channel_still_open()) > + break;^^^^^ you cannot break. the client decides when the connection gets closed. i think it could still request another login shell. but yes, the SIGCLD races could be fixed with a select timeout. but that's ugly. perhaps using siglongjmp is less ugly and even portable. -m