thr3ads.net - freebsd stable - ZFS root mount regression [Jul 2019]

If this information is useful, please help other people find it:
Share via:

Garrett Wollman

2019-Jul-19 18:21 UTC

ZFS root mount regression

I recently upgraded several file servers from 11.2 to 11.3.  All of
them boot from a ZFS pool called "tank" (the data is in a different
pool).  In a couple of instances (which caused me to have to take a
late-evening 140-mile drive to the remote data center where they are
located), the servers crashed at the root mount phase.  In one case,
it bailed out with error 5 (I believe that's [EIO]) to the usual
mountroot prompt.  In the second case, the kernel panicked instead.

The root cause (no pun intended) on both servers was a disk which was
supplied by the vendor with a label on it that claimed to be part of
the "tank" pool, and for some reason the 11.3 kernel was trying to
mount that (faulted) pool rather than the real one.  The disks and
pool configuration were unchanged from 11.2 (and probably 11.1 as
well) so I am puzzled.

Other than laboriously running "zpool labelclear -f /dev/somedisk" for
every piece of media that comes into my hands, is there anything else
I could have done to avoid this?

-GAWollman

Trond Endrestøl

2019-Jul-20 09:22 UTC

head link

ZFS root mount regression

On Fri, 19 Jul 2019 14:21-0400, Garrett Wollman wrote:
> Other than laboriously running "zpool labelclear -f
/dev/somedisk" for
> every piece of media that comes into my hands, is there anything else
> I could have done to avoid this?
I usually incorporate the hostname in the pool names. At one point I 
was contemplating if I should also have generation numbers in the pool 
names. E.g.:

enterprise_zroot
enterprise_zdata

enterprise_zroot_g01
enterprise_zdata_g01

enterprise_zroot_g02
enterprise_zdata_g02

-- 
Trond.

Eugene Grosbein

2019-Jul-20 10:41 UTC

head link

ZFS root mount regression

CC'ing Alexander Motin who comitted the change.

20.07.2019 1:21, Garrett Wollman wrote:
> I recently upgraded several file servers from 11.2 to 11.3.  All of
> them boot from a ZFS pool called "tank" (the data is in a
different
> pool).  In a couple of instances (which caused me to have to take a
> late-evening 140-mile drive to the remote data center where they are
> located), the servers crashed at the root mount phase.  In one case,
> it bailed out with error 5 (I believe that's [EIO]) to the usual
> mountroot prompt.  In the second case, the kernel panicked instead.
> 
> The root cause (no pun intended) on both servers was a disk which was
> supplied by the vendor with a label on it that claimed to be part of
> the "tank" pool, and for some reason the 11.3 kernel was trying
to
> mount that (faulted) pool rather than the real one.  The disks and
> pool configuration were unchanged from 11.2 (and probably 11.1 as
> well) so I am puzzled.
> 
> Other than laboriously running "zpool labelclear -f
/dev/somedisk" for
> every piece of media that comes into my hands, is there anything else
> I could have done to avoid this?
Both 11.3-RELEASE announcement and Release Notes mention this:
> The ZFS filesystem has been updated to implement parallel mounting.
I strongly suggest reading Release documentation in case of troubles
after upgrade, at least. Or better, read *before* updating.

I guess this parallelism created some race for your case.

Unfortunately, a way to fall back to sequential mounting seems undocumented.
libzfs checks for ZFS_SERIAL_MOUNT environment variable to exist having any
value.
I'm not sure how you set it for mounting root, maybe it will use kenv,
so try adding to /boot/loader.conf:

ZFS_SERIAL_MOUNT=1

Alexander should have more knowledge on this.

And of course, attaching unrelated device having label conflicting
with root pool is asking for trouble. Re-label it ASAP.

freebsd stable - Jul 2019 - ZFS root mount regression

ZFS root mount regression

ZFS root mount regression

ZFS root mount regression