thr3ads.net - freebsd stable - freebsd-update and hang during reboot [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Nick Rogers

2015-Mar-09 16:19 UTC

freebsd-update and hang during reboot

On Tue, Feb 10, 2015 at 1:37 PM, Nick Rogers <ncrogers at gmail.com>
wrote:
>
>
> On Mon, Feb 9, 2015 at 9:08 AM, Ian Lepore <ian at freebsd.org>
wrote:
>
>> On Mon, 2015-02-09 at 11:41 -0500, Kurt Lidl wrote:
>> > Joel wrote:
>> > > Hi,
>> > >
>> > > Just about every machine I have seems to hang after running
>> freebsd-update and doing a reboot. The last message on the screen is
"All
>> buffers synced? and it just freezes.
>> > >
>> > > This happens when doing a freebsd-update and going from 10.0
to 10.1,
>> but also when doing a fresh 10.1 install and using freebsd-update to
get
>> the latest -pX security patches. As soon as I reboot the machine, it
hangs.
>> > >
>> > > I?ve tried it on several different HP ProLiant models, on
Intel NUCs
>> and on VMware virtual machines. Same phenomenon everywhere. It?s really
>> easy to trigger: just install 10.1, use default settings everywhere,
>> freebsd-update fetch/install, shutdown -r now and BOOM. It hangs. I
think
>> I?ve seen it on
>> > >
>> > >
>> > >
>> > >
>> > > 30 servers or so now.
>> > >
>> > > Everything works like it should after the initial hang tough
- no
>> matter how many times I reboot it completes the reboot cycle just fine.
>> > >
>> > > I?ve seen several people (mostly on IRC) mention this
problem, but no
>> solution.
>> > >
>> > > Is anyone working on fixing this?
>> >
>> > I ran into this problem in spades when upgrading a set of servers
from
>> > FreeBSD 9.0 to 9.1.  I happened consistently.  Normal reboots
worked,
>> > but when going from 9.0 to 9.1, it *ALWAYS* hung, and it always
hung
>> > at the same place, after printing the "All buffers
synced" message.
>> >
>> > I ultimately determined that if I did the following, rather than
>> > just a "reboot" or "shutdown -r now 'FreeBSD
9.1-RELEASE upgrade'",
>> > it would consistently AVOID the hang:
>> >
>> > sync ; sync ; sync ; shutdown -o -n -r now "FreeBSD 9.1
install"
>> >
>> > Your mileage may vary, but you don't have a lot to lose by
trying it.
>> >
>> > -Kurt
>> >
>>
>> That is just bad advice.  sync(1) does not g'tee that all data has
been
>> written, no matter how many times you type it.  shutdown -n tells the
>> system to abandon unwritten data.  All in all, this is a recipe for
>> silent filesystem corruption.  Using it after an update is just asking
>> to have a mix of old and new files on the system after the reboot.
>>
>> A more robust workaround would be to "mount -r" on all
filesystems
>> before invoking the shutdown (even a shutdown -n should be safe after
>> everything has been remounted readonly).  If the mount -r hangs on one
>> of the filesystems, then you've probably got a clue as to where a
normal
>> shutdown is hanging.
>>
>
> FWIW mount -r on the root filesystem hangs for me. If I disable
> softupdates-journaling on the root filesystem before the upgrade process,
> the system no longer hangs on the last reboot after userland upgrade.
> However, the root filesystem still comes up dirty with an incorrect free
> block count during fsck.
>
 Is anyone working on fixing this problem? It seems like this should have
some kind of "full court press" as it is obviously affecting plenty of
people, some of which have spoken up in the following PR

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458

I realize its a tough problem to track down, and if I had the appropriate
skills I would help. But so far all I've been able to do, like others, is
replicate and complain about the problem.

Its still affecting upgrading to 10.1-RELEASE-p6 from the official
10.1-RELEASE distribution, and from 10.1-RELEASE-p5. I just had another
production server hang during reboot after updating to p6, and I don't see
this changing for the inevitable p7 unless this problem gets more
attention. Can someone with the right skill-set please help figure this
out? Thank you.

>> -- Ian
>>
>>
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>>
>
>

Nick Rogers

2015-Apr-15 21:44 UTC

head link

freebsd-update and hang during reboot

On Mon, Mar 9, 2015 at 9:19 AM, Nick Rogers <ncrogers at gmail.com> wrote:
>
>
> On Tue, Feb 10, 2015 at 1:37 PM, Nick Rogers <ncrogers at gmail.com>
wrote:
>
>>
>>
>> On Mon, Feb 9, 2015 at 9:08 AM, Ian Lepore <ian at freebsd.org>
wrote:
>>
>>> On Mon, 2015-02-09 at 11:41 -0500, Kurt Lidl wrote:
>>> > Joel wrote:
>>> > > Hi,
>>> > >
>>> > > Just about every machine I have seems to hang after
running
>>> freebsd-update and doing a reboot. The last message on the screen
is "All
>>> buffers synced? and it just freezes.
>>> > >
>>> > > This happens when doing a freebsd-update and going from
10.0 to
>>> 10.1, but also when doing a fresh 10.1 install and using
freebsd-update to
>>> get the latest -pX security patches. As soon as I reboot the
machine, it
>>> hangs.
>>> > >
>>> > > I?ve tried it on several different HP ProLiant models, on
Intel NUCs
>>> and on VMware virtual machines. Same phenomenon everywhere. It?s
really
>>> easy to trigger: just install 10.1, use default settings
everywhere,
>>> freebsd-update fetch/install, shutdown -r now and BOOM. It hangs. I
think
>>> I?ve seen it on
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > 30 servers or so now.
>>> > >
>>> > > Everything works like it should after the initial hang
tough - no
>>> matter how many times I reboot it completes the reboot cycle just
fine.
>>> > >
>>> > > I?ve seen several people (mostly on IRC) mention this
problem, but
>>> no solution.
>>> > >
>>> > > Is anyone working on fixing this?
>>> >
>>> > I ran into this problem in spades when upgrading a set of
servers from
>>> > FreeBSD 9.0 to 9.1.  I happened consistently.  Normal reboots
worked,
>>> > but when going from 9.0 to 9.1, it *ALWAYS* hung, and it
always hung
>>> > at the same place, after printing the "All buffers
synced" message.
>>> >
>>> > I ultimately determined that if I did the following, rather
than
>>> > just a "reboot" or "shutdown -r now
'FreeBSD 9.1-RELEASE upgrade'",
>>> > it would consistently AVOID the hang:
>>> >
>>> > sync ; sync ; sync ; shutdown -o -n -r now "FreeBSD 9.1
install"
>>> >
>>> > Your mileage may vary, but you don't have a lot to lose by
trying it.
>>> >
>>> > -Kurt
>>> >
>>>
>>> That is just bad advice.  sync(1) does not g'tee that all data
has been
>>> written, no matter how many times you type it.  shutdown -n tells
the
>>> system to abandon unwritten data.  All in all, this is a recipe for
>>> silent filesystem corruption.  Using it after an update is just
asking
>>> to have a mix of old and new files on the system after the reboot.
>>>
>>> A more robust workaround would be to "mount -r" on all
filesystems
>>> before invoking the shutdown (even a shutdown -n should be safe
after
>>> everything has been remounted readonly).  If the mount -r hangs on
one
>>> of the filesystems, then you've probably got a clue as to where
a normal
>>> shutdown is hanging.
>>>
>>
>> FWIW mount -r on the root filesystem hangs for me. If I disable
>> softupdates-journaling on the root filesystem before the upgrade
process,
>> the system no longer hangs on the last reboot after userland upgrade.
>> However, the root filesystem still comes up dirty with an incorrect
free
>> block count during fsck.
>>
>
>  Is anyone working on fixing this problem? It seems like this should have
> some kind of "full court press" as it is obviously affecting
plenty of
> people, some of which have spoken up in the following PR
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195458
>
> I realize its a tough problem to track down, and if I had the appropriate
> skills I would help. But so far all I've been able to do, like others,
is
> replicate and complain about the problem.
>
> Its still affecting upgrading to 10.1-RELEASE-p6 from the official
> 10.1-RELEASE distribution, and from 10.1-RELEASE-p5. I just had another
> production server hang during reboot after updating to p6, and I don't
see
> this changing for the inevitable p7 unless this problem gets more
> attention. Can someone with the right skill-set please help figure this
> out? Thank you.
>
In case anyone is still dealing with this problem, the fix was MFC'd to
stable/10 a few days. I am assuming this will not end up getting back
ported to releng/10.1. I've compiled a patch with the fix that works
against 10.1-RELEASE. Maybe it will be useful for any of you like me that
don't run 10-stable, but are comfortable with custom kernels and are still
dealing with this issue when running freebsd-update every time a new patch
level is released. Diff is below.

# Fix bug causing a hang while unmounting the root filesystem during
# reboot after performing a freebsd-update.
#
#
# Original commit to HEAD:
# https://svnweb.freebsd.org/base?view=revision&revision=280760
# MFC to stable:
# https://svnweb.freebsd.org/base?view=revision&revision=281350
#
# The following commits were taken from stable/10/sys/ufs/ffs between
# the release of 10.1-RELEASE (r272459) and MFC of the fix (r281350)
# in order for the fix to cleanly apply to releng/10.1. The two
# unrelated commits seem like reasonable fixes to include as well.
#
# https://svnweb.freebsd.org/base?view=revision&revision=281350
# https://svnweb.freebsd.org/base?view=revision&revision=278667
# https://svnweb.freebsd.org/base?view=revision&revision=274305
#
Index: ufs/ffs/ffs_vfsops.c
==================================================================---
ufs/ffs/ffs_vfsops.c (revision 272459)
+++ ufs/ffs/ffs_vfsops.c (revision 281350)
@@ -1502,8 +1502,11 @@
  if (fs->fs_fmod != 0 && fs->fs_ronly != 0 &&
ump->um_fsckpid == 0)
  panic("%s: ffs_sync: modification on read-only filesystem",
     fs->fs_fsmnt);
- if (waitfor == MNT_LAZY)
- return (ffs_sync_lazy(mp));
+ if (waitfor == MNT_LAZY) {
+ if (!rebooting)
+ return (ffs_sync_lazy(mp));
+ waitfor = MNT_NOWAIT;
+ }

  /*
  * Write back each (modified) inode.
@@ -1560,7 +1563,7 @@
  /*
  * Force stale filesystem control information to be flushed.
  */
- if (waitfor == MNT_WAIT) {
+ if (waitfor == MNT_WAIT || rebooting) {
  if ((error = softdep_flushworklist(ump->um_mountp, &count, td)))
  allerror = error;
  /* Flushed work items may create new vnodes to clean */
@@ -1577,9 +1580,12 @@
  if (bo->bo_numoutput > 0 || bo->bo_dirty.bv_cnt > 0) {
  BO_UNLOCK(bo);
  vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY);
- if ((error = VOP_FSYNC(devvp, waitfor, td)) != 0)
+ error = VOP_FSYNC(devvp, waitfor, td);
+ VOP_UNLOCK(devvp, 0);
+ if (MOUNTEDSOFTDEP(mp) && (error == 0 || error == EAGAIN))
+ error = ffs_sbupdate(ump, waitfor, 0);
+ if (error != 0)
  allerror = error;
- VOP_UNLOCK(devvp, 0);
  if (allerror == 0 && waitfor == MNT_WAIT)
  goto loop;
  } else if (suspend != 0) {
Index: ufs/ffs/ffs_softdep.c
==================================================================---
ufs/ffs/ffs_softdep.c (revision 272459)
+++ ufs/ffs/ffs_softdep.c (revision 281350)
@@ -735,9 +735,10 @@
 static void check_clear_deps(struct mount *);
 static void softdep_error(char *, int);
 static int softdep_process_worklist(struct mount *, int);
-static int softdep_waitidle(struct mount *);
+static int softdep_waitidle(struct mount *, int);
 static void drain_output(struct vnode *);
 static struct buf *getdirtybuf(struct buf *, struct rwlock *, int);
+static int check_inodedep_free(struct inodedep *);
 static void clear_remove(struct mount *);
 static void clear_inodedeps(struct mount *);
 static void unlinked_inodedep(struct mount *, struct inodedep *);
@@ -1377,6 +1378,10 @@
  mp = (struct mount *)addr;
  ump = VFSTOUFS(mp);
  atomic_add_int(&stat_flush_threads, 1);
+ ACQUIRE_LOCK(ump);
+ ump->softdep_flags &= ~FLUSH_STARTING;
+ wakeup(&ump->softdep_flushtd);
+ FREE_LOCK(ump);
  if (print_threads) {
  if (stat_flush_threads == 1)
  printf("Running %s at pid %d\n", bufdaemonproc->p_comm,
@@ -1389,7 +1394,7 @@
     VFSTOUFS(mp)->softdep_jblocks->jb_suspended))
  kthread_suspend_check();
  ACQUIRE_LOCK(ump);
- if ((ump->softdep_flags & FLUSH_CLEANUP) == 0)
+ if ((ump->softdep_flags & (FLUSH_CLEANUP | FLUSH_EXIT)) == 0)
  msleep(&ump->softdep_flushtd, LOCK_PTR(ump), PVM,
     "sdflush", hz / 2);
  ump->softdep_flags &= ~FLUSH_CLEANUP;
@@ -1419,11 +1424,9 @@

  ump = VFSTOUFS(mp);
  LOCK_OWNED(ump);
- if ((ump->softdep_flags & (FLUSH_CLEANUP | FLUSH_EXIT)) == 0) {
+ if ((ump->softdep_flags & (FLUSH_CLEANUP | FLUSH_EXIT)) == 0)
  ump->softdep_flags |= FLUSH_CLEANUP;
- if (ump->softdep_flushtd->td_wchan == &ump->softdep_flushtd)
- wakeup(&ump->softdep_flushtd);
- }
+ wakeup(&ump->softdep_flushtd);
 }

 static int
@@ -1468,14 +1471,10 @@
  TAILQ_INSERT_TAIL(&softdepmounts, sdp, sd_next);
  FREE_GBLLOCK(&lk);
  if ((altump->softdep_flags &
-    (FLUSH_CLEANUP | FLUSH_EXIT)) == 0) {
+    (FLUSH_CLEANUP | FLUSH_EXIT)) == 0)
  altump->softdep_flags |= FLUSH_CLEANUP;
- altump->um_softdep->sd_cleanups++;
- if (altump->softdep_flushtd->td_wchan =-   
&altump->softdep_flushtd) {
- wakeup(&altump->softdep_flushtd);
- }
- }
+ altump->um_softdep->sd_cleanups++;
+ wakeup(&altump->softdep_flushtd);
  FREE_LOCK(altump);
  }
  }
@@ -1887,8 +1886,8 @@
  struct thread *td;
 {
  struct vnode *devvp;
- int count, error = 0;
  struct ufsmount *ump;
+ int count, error;

  /*
  * Alternately flush the block device associated with the mount
@@ -1897,6 +1896,7 @@
  * are found.
  */
  *countp = 0;
+ error = 0;
  ump = VFSTOUFS(oldmnt);
  devvp = ump->um_devvp;
  while ((count = softdep_process_worklist(oldmnt, 1)) > 0) {
@@ -1904,36 +1904,47 @@
  vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY);
  error = VOP_FSYNC(devvp, MNT_WAIT, td);
  VOP_UNLOCK(devvp, 0);
- if (error)
+ if (error != 0)
  break;
  }
  return (error);
 }

+#define SU_WAITIDLE_RETRIES 20
 static int
-softdep_waitidle(struct mount *mp)
+softdep_waitidle(struct mount *mp, int flags __unused)
 {
  struct ufsmount *ump;
- int error;
- int i;
+ struct vnode *devvp;
+ struct thread *td;
+ int error, i;

  ump = VFSTOUFS(mp);
+ devvp = ump->um_devvp;
+ td = curthread;
+ error = 0;
  ACQUIRE_LOCK(ump);
- for (i = 0; i < 10 && ump->softdep_deps; i++) {
+ for (i = 0; i < SU_WAITIDLE_RETRIES && ump->softdep_deps != 0;
i++) {
  ump->softdep_req = 1;
- if (ump->softdep_on_worklist)
- panic("softdep_waitidle: work added after flush.");
- msleep(&ump->softdep_deps, LOCK_PTR(ump), PVM, "softdeps",
1);
+ KASSERT((flags & FORCECLOSE) == 0 ||
+    ump->softdep_on_worklist == 0,
+    ("softdep_waitidle: work added after flush"));
+ msleep(&ump->softdep_deps, LOCK_PTR(ump), PVM | PDROP,
+    "softdeps", 10 * hz);
+ vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY);
+ error = VOP_FSYNC(devvp, MNT_WAIT, td);
+ VOP_UNLOCK(devvp, 0);
+ if (error != 0)
+ break;
+ ACQUIRE_LOCK(ump);
  }
  ump->softdep_req = 0;
- FREE_LOCK(ump);
- error = 0;
- if (i == 10) {
+ if (i == SU_WAITIDLE_RETRIES && error == 0 &&
ump->softdep_deps != 0) {
  error = EBUSY;
  printf("softdep_waitidle: Failed to flush worklist for %p\n",
     mp);
  }
-
+ FREE_LOCK(ump);
  return (error);
 }

@@ -1990,7 +2001,7 @@
  error = EBUSY;
  }
  if (!error)
- error = softdep_waitidle(oldmnt);
+ error = softdep_waitidle(oldmnt, flags);
  if (!error) {
  if (oldmnt->mnt_kern_flag & MNTK_UNMOUNT) {
  retry = 0;
@@ -2490,9 +2501,18 @@
  /*
  * Start our flushing thread in the bufdaemon process.
  */
+ ACQUIRE_LOCK(ump);
+ ump->softdep_flags |= FLUSH_STARTING;
+ FREE_LOCK(ump);
  kproc_kthread_add(&softdep_flush, mp, &bufdaemonproc,
     &ump->softdep_flushtd, 0, 0, "softdepflush", "%s
worker",
     mp->mnt_stat.f_mntonname);
+ ACQUIRE_LOCK(ump);
+ while ((ump->softdep_flags & FLUSH_STARTING) != 0) {
+ msleep(&ump->softdep_flushtd, LOCK_PTR(ump), PVM, "sdstart",
+    hz / 2);
+ }
+ FREE_LOCK(ump);
  /*
  * When doing soft updates, the counters in the
  * superblock may have gotten out of sync. Recomputation
@@ -7629,17 +7649,13 @@
  return (1);
 }

-/*
- * Try to free an inodedep structure. Return 1 if it could be freed.
- */
 static int
-free_inodedep(inodedep)
+check_inodedep_free(inodedep)
  struct inodedep *inodedep;
 {

  LOCK_OWNED(VFSTOUFS(inodedep->id_list.wk_mp));
- if ((inodedep->id_state & (ONWORKLIST | UNLINKED)) != 0 ||
-    (inodedep->id_state & ALLCOMPLETE) != ALLCOMPLETE ||
+ if ((inodedep->id_state & ALLCOMPLETE) != ALLCOMPLETE ||
     !LIST_EMPTY(&inodedep->id_dirremhd) ||
     !LIST_EMPTY(&inodedep->id_pendinghd) ||
     !LIST_EMPTY(&inodedep->id_bufwait) ||
@@ -7654,6 +7670,21 @@
     inodedep->id_nlinkdelta != 0 ||
     inodedep->id_savedino1 != NULL)
  return (0);
+ return (1);
+}
+
+/*
+ * Try to free an inodedep structure. Return 1 if it could be freed.
+ */
+static int
+free_inodedep(inodedep)
+ struct inodedep *inodedep;
+{
+
+ LOCK_OWNED(VFSTOUFS(inodedep->id_list.wk_mp));
+ if ((inodedep->id_state & (ONWORKLIST | UNLINKED)) != 0 ||
+    !check_inodedep_free(inodedep))
+ return (0);
  if (inodedep->id_state & ONDEPLIST)
  LIST_REMOVE(inodedep, id_deps);
  LIST_REMOVE(inodedep, id_hash);
@@ -13838,7 +13869,8 @@
 {
  struct bufobj *bo;
  struct ufsmount *ump;
- int error;
+ struct inodedep *inodedep;
+ int error, unlinked;

  bo = &devvp->v_bufobj;
  ASSERT_BO_WLOCKED(bo);
@@ -13899,6 +13931,20 @@
  break;
  }

+ unlinked = 0;
+ if (MOUNTEDSUJ(mp)) {
+ for (inodedep = TAILQ_FIRST(&ump->softdep_unlinked);
+    inodedep != NULL;
+    inodedep = TAILQ_NEXT(inodedep, id_unlinked)) {
+ if ((inodedep->id_state & (UNLINKED | UNLINKLINKS |
+    UNLINKONLIST)) != (UNLINKED | UNLINKLINKS |
+    UNLINKONLIST) ||
+    !check_inodedep_free(inodedep))
+ continue;
+ unlinked++;
+ }
+ }
+
  /*
  * Reasons for needing more work before suspend:
  * - Dirty buffers on devvp.
@@ -13908,8 +13954,8 @@
  error = 0;
  if (bo->bo_numoutput > 0 ||
     bo->bo_dirty.bv_cnt > 0 ||
-    softdep_depcnt != 0 ||
-    ump->softdep_deps != 0 ||
+    softdep_depcnt != unlinked ||
+    ump->softdep_deps != unlinked ||
     softdep_accdepcnt != ump->softdep_accdeps ||
     secondary_writes != 0 ||
     mp->mnt_secondary_writes != 0 ||
Index: ufs/ffs/softdep.h
==================================================================---
ufs/ffs/softdep.h (revision 272459)
+++ ufs/ffs/softdep.h (revision 281350)
@@ -1063,6 +1063,8 @@
  */
 #define FLUSH_EXIT 0x0001 /* time to exit */
 #define FLUSH_CLEANUP 0x0002 /* need to clear out softdep structures */
+#define FLUSH_STARTING 0x0004 /* flush thread not yet started */
+
 /*
  * Keep the old names from when these were in the ufsmount structure.
  */


>
>>> -- Ian
>>>
>>>
>>> _______________________________________________
>>> freebsd-stable at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe
at freebsd.org
>>> "
>>>
>>
>>
>

freebsd stable - Apr 2015 - freebsd-update and hang during reboot

freebsd-update and hang during reboot

freebsd-update and hang during reboot