With your latest patch applied, I ran through my procedure more than a dozen times and no panics! Any explanation why sleep(STALL_TIMEOUT) as apposed to a bunch of sleep(1)'s tickles the panic? Also, it is definitely not sleeping for 30 seconds. I guess some event interrupts the sleep loop? Thanks heaps for your time and effort, -andyf %%% Please try the following patch. diff --git a/sbin/init/init.c b/sbin/init/init.c index bda86b5..25ac2bd 100644 --- a/sbin/init/init.c +++ b/sbin/init/init.c @@ -870,6 +870,7 @@ single_user(void) sigset_t mask; const char *shell; char *argv[2]; + struct timeval tv, tn; #ifdef SECURE struct ttyent *typ; struct passwd *pp; @@ -884,8 +885,13 @@ single_user(void) if (Reboot) { /* Instead of going single user, let's reboot the machine */ sync(); - reboot(howto); - _exit(0); + if (reboot(howto) == -1) { + emergency("reboot(%#x) failed, %s", howto, + strerror(errno)); + _exit(1); /* panic and reboot */ + } + warning("reboot(%#x) returned", howto); + _exit(0); /* panic as well */ } shell = get_shell(); @@ -1002,7 +1008,14 @@ single_user(void) * reboot(8) killed shell? */ warning("single user shell terminated."); - sleep(STALL_TIMEOUT); + gettimeofday(&tv, NULL); + tn = tv; + tv.tv_sec += STALL_TIMEOUT; + while (tv.tv_sec > tn.tv_sec || (tv.tv_sec =+ tn.tv_sec && tv.tv_usec > tn.tv_usec)) { + sleep(1); + gettimeofday(&tn, NULL); + } _exit(0); } else { warning("single user shell terminated, restarting");
Graham Menhennitt
2016-Oct-06 22:55 UTC
Reproducible panic - Going nowhere without my init!
Let me preface this by saying that I know nothing about this particular bit of code, but... As a general rule, I would question the use of gettimeofday() while panicing. At that stage, everything could have already gone down the plug hole. That said, it already calls sleep(), so maybe that uses the same gettimeofday() call internally. In which case, please ignore this comment. Graham On 7/10/2016 9:32 AM, Andy Farkas wrote:> With your latest patch applied, I ran through my procedure more > than a dozen times and no panics! > > Any explanation why sleep(STALL_TIMEOUT) as apposed to a > bunch of sleep(1)'s tickles the panic? > > Also, it is definitely not sleeping for 30 seconds. I guess some > event interrupts the sleep loop? > > Thanks heaps for your time and effort, > > -andyf > > %%% > Please try the following patch. > > diff --git a/sbin/init/init.c b/sbin/init/init.c > index bda86b5..25ac2bd 100644 > --- a/sbin/init/init.c > +++ b/sbin/init/init.c > @@ -870,6 +870,7 @@ single_user(void) > sigset_t mask; > const char *shell; > char *argv[2]; > + struct timeval tv, tn; > #ifdef SECURE > struct ttyent *typ; > struct passwd *pp; > @@ -884,8 +885,13 @@ single_user(void) > if (Reboot) { > /* Instead of going single user, let's reboot the machine */ > sync(); > - reboot(howto); > - _exit(0); > + if (reboot(howto) == -1) { > + emergency("reboot(%#x) failed, %s", howto, > + strerror(errno)); > + _exit(1); /* panic and reboot */ > + } > + warning("reboot(%#x) returned", howto); > + _exit(0); /* panic as well */ > } > > shell = get_shell(); > @@ -1002,7 +1008,14 @@ single_user(void) > * reboot(8) killed shell? > */ > warning("single user shell terminated."); > - sleep(STALL_TIMEOUT); > + gettimeofday(&tv, NULL); > + tn = tv; > + tv.tv_sec += STALL_TIMEOUT; > + while (tv.tv_sec > tn.tv_sec || (tv.tv_sec => + tn.tv_sec && tv.tv_usec > tn.tv_usec)) { > + sleep(1); > + gettimeofday(&tn, NULL); > + } > _exit(0); > } else { > warning("single user shell terminated, restarting"); > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
Konstantin Belousov
2016-Oct-07 12:18 UTC
Reproducible panic - Going nowhere without my init!
On Fri, Oct 07, 2016 at 08:32:24AM +1000, Andy Farkas wrote:> With your latest patch applied, I ran through my procedure more > than a dozen times and no panics! > > Any explanation why sleep(STALL_TIMEOUT) as apposed to a > bunch of sleep(1)'s tickles the panic?What happened was sleep() got interrupted by a signal. Normally reboot(8) stops init with SIGTSTP, then kill processes, then calls reboot(2). reboot(8) does not and cannot get acknowledges for the receipts of the signals by signalled processes, which indeed may result in the sleep interruption if other signal is delivered to init before SIGTSTP. Patch does not add 'just bunch of sleeps'. The code in the patch ensures that _exit() is called not earlier than STALL_TIMEOUT from the moment of detection of the shell exit, by reissuing sleep(). I changed the argument to 1 second to avoid situation where we e.g. sleep for 15 secs, get interrupt and then sleep for whole 30 secs. The overtime with sleep(1) is limited to 1 second.> > Also, it is definitely not sleeping for 30 seconds. I guess some > event interrupts the sleep loop?