CC'ing Alexander Motin who comitted the change. 20.07.2019 1:21, Garrett Wollman wrote:> I recently upgraded several file servers from 11.2 to 11.3. All of > them boot from a ZFS pool called "tank" (the data is in a different > pool). In a couple of instances (which caused me to have to take a > late-evening 140-mile drive to the remote data center where they are > located), the servers crashed at the root mount phase. In one case, > it bailed out with error 5 (I believe that's [EIO]) to the usual > mountroot prompt. In the second case, the kernel panicked instead. > > The root cause (no pun intended) on both servers was a disk which was > supplied by the vendor with a label on it that claimed to be part of > the "tank" pool, and for some reason the 11.3 kernel was trying to > mount that (faulted) pool rather than the real one. The disks and > pool configuration were unchanged from 11.2 (and probably 11.1 as > well) so I am puzzled. > > Other than laboriously running "zpool labelclear -f /dev/somedisk" for > every piece of media that comes into my hands, is there anything else > I could have done to avoid this?Both 11.3-RELEASE announcement and Release Notes mention this:> The ZFS filesystem has been updated to implement parallel mounting.I strongly suggest reading Release documentation in case of troubles after upgrade, at least. Or better, read *before* updating. I guess this parallelism created some race for your case. Unfortunately, a way to fall back to sequential mounting seems undocumented. libzfs checks for ZFS_SERIAL_MOUNT environment variable to exist having any value. I'm not sure how you set it for mounting root, maybe it will use kenv, so try adding to /boot/loader.conf: ZFS_SERIAL_MOUNT=1 Alexander should have more knowledge on this. And of course, attaching unrelated device having label conflicting with root pool is asking for trouble. Re-label it ASAP.
Hi, I am not sure how the original description leads to conclusion that problem is related to parallel mounting. From my point of view it sounds like a problem that root pool mounting happens based on name, not pool GUID that needs to be passed from the loader. We have seen problem like that ourselves too when boot pool names collide. So I doubt it is a new problem, just nobody got to fixing it yet. On 20.07.2019 06:41, Eugene Grosbein wrote:> CC'ing Alexander Motin who comitted the change. > > 20.07.2019 1:21, Garrett Wollman wrote: > >> I recently upgraded several file servers from 11.2 to 11.3. All of >> them boot from a ZFS pool called "tank" (the data is in a different >> pool). In a couple of instances (which caused me to have to take a >> late-evening 140-mile drive to the remote data center where they are >> located), the servers crashed at the root mount phase. In one case, >> it bailed out with error 5 (I believe that's [EIO]) to the usual >> mountroot prompt. In the second case, the kernel panicked instead. >> >> The root cause (no pun intended) on both servers was a disk which was >> supplied by the vendor with a label on it that claimed to be part of >> the "tank" pool, and for some reason the 11.3 kernel was trying to >> mount that (faulted) pool rather than the real one. The disks and >> pool configuration were unchanged from 11.2 (and probably 11.1 as >> well) so I am puzzled. >> >> Other than laboriously running "zpool labelclear -f /dev/somedisk" for >> every piece of media that comes into my hands, is there anything else >> I could have done to avoid this? > > Both 11.3-RELEASE announcement and Release Notes mention this: > >> The ZFS filesystem has been updated to implement parallel mounting. > > I strongly suggest reading Release documentation in case of troubles > after upgrade, at least. Or better, read *before* updating. > > I guess this parallelism created some race for your case. > > Unfortunately, a way to fall back to sequential mounting seems undocumented. > libzfs checks for ZFS_SERIAL_MOUNT environment variable to exist having any value. > I'm not sure how you set it for mounting root, maybe it will use kenv, > so try adding to /boot/loader.conf: > > ZFS_SERIAL_MOUNT=1 > > Alexander should have more knowledge on this. > > And of course, attaching unrelated device having label conflicting > with root pool is asking for trouble. Re-label it ASAP. >-- Alexander Motin
<<On Sat, 20 Jul 2019 17:41:39 +0700, Eugene Grosbein <eugen at grosbein.net> said:> Both 11.3-RELEASE announcement and Release Notes mention this:>> The ZFS filesystem has been updated to implement parallel mounting.> I strongly suggest reading Release documentation in case of troubles > after upgrade, at least. Or better, read *before* updating.Two servers breaking out of thirty-five upgraded is not the sort of thing that you'd expect to be implied by such a statement, especially when there's only one filesystem being mounted at the relevant time. -GAWollman