Peter Jeremy
2020-Jul-19 11:21 UTC
svn commit: r362848 - in stable/12/sys: net netinet sys
I'm sending this to -stable, rather than the src groups because I don't believe the problem is the commit itself, rather the commit has uncovered a latent problem elsewhere. On 2020-Jul-01 18:03:38 +0000, Michael Tuexen <tuexen at FreeBSD.org> wrote:>Author: tuexen >Date: Wed Jul 1 18:03:38 2020 >New Revision: 362848 >URL: https://svnweb.freebsd.org/changeset/base/362848 > >Log: > MFC r353480: Use event handler in SCTPI have no idea how, but this update breaks booting amd64 for me (r362847 works and this doesn't). I have a custom kernel with ZFS but no SCTP so I have no real idea how this could break booting - presumably the eventhandler change has uncovered a bug somewhere else. The symptoms are that I get: Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 more seconds Mounting from zfs:zroot/ROOT/r363310 failed with error 6 (r363310 is where I was trying to update to and I didn't change the BE name as I was searching for the problem and error 6 is ENXIO). I tried to reproduce the problem with GENERIC but it hangs after displaying the EFI framebuffer information (I've seen that before and suspect it is a loader problem but haven't dug into it). Does anyone have any ideas? -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20200719/c5ab5e6a/attachment.sig>
Konstantin Belousov
2020-Jul-19 11:48 UTC
svn commit: r362848 - in stable/12/sys: net netinet sys
On Sun, Jul 19, 2020 at 09:21:02PM +1000, Peter Jeremy wrote:> I'm sending this to -stable, rather than the src groups because I > don't believe the problem is the commit itself, rather the commit > has uncovered a latent problem elsewhere. > > On 2020-Jul-01 18:03:38 +0000, Michael Tuexen <tuexen at FreeBSD.org> wrote: > >Author: tuexen > >Date: Wed Jul 1 18:03:38 2020 > >New Revision: 362848 > >URL: https://svnweb.freebsd.org/changeset/base/362848 > > > >Log: > > MFC r353480: Use event handler in SCTP > > I have no idea how, but this update breaks booting amd64 for me (r362847 > works and this doesn't). I have a custom kernel with ZFS but no SCTP so I > have no real idea how this could break booting - presumably the > eventhandler change has uncovered a bug somewhere else. > > The symptoms are that I get: > Mounting from zfs:zroot/ROOT/r363310 failed with error 6; retrying for 3 more seconds > Mounting from zfs:zroot/ROOT/r363310 failed with error 6 > > (r363310 is where I was trying to update to and I didn't change the BE > name as I was searching for the problem and error 6 is ENXIO). > > I tried to reproduce the problem with GENERIC but it hangs after > displaying the EFI framebuffer information (I've seen that before and > suspect it is a loader problem but haven't dug into it). > > Does anyone have any ideas?Did you checked that the physical devices where your ZFS pool is located, are detected, and that kernel messages for their drivers are as usual ? Overall, is there anything strange in the verbose dmesg ?
Peter Jeremy
2020-Aug-24 09:31 UTC
svn commit: r362848 - in stable/12/sys: net netinet sys
TL;DR: Ensure you explicitly destroy all ZFS labels on disused root pools. On 2020-Jul-19 21:21:02 +1000, Peter Jeremy <peter at server.rulingia.com> wrote:>I'm sending this to -stable, rather than the src groups because I >don't believe the problem is the commit itself, rather the commit >has uncovered a latent problem elsewhere. > >On 2020-Jul-01 18:03:38 +0000, Michael Tuexen <tuexen at FreeBSD.org> wrote: >>Author: tuexen >>Date: Wed Jul 1 18:03:38 2020 >>New Revision: 362848 >>URL: https://svnweb.freebsd.org/changeset/base/362848 >> >>Log: >> MFC r353480: Use event handler in SCTP > >I have no idea how, but this update breaks booting amd64 for me (r362847 >works and this doesn't). I have a custom kernel with ZFS but no SCTP so I >have no real idea how this could break booting - presumably the >eventhandler change has uncovered a bug somewhere else.To close the loop on this, the problem was a combination of: * changes in GEOM provider ordering; * insufficient checks when ZFS is looking for the root pool; * my system having remnants of a disused pool with the same name as the root poop. It seems that the order of GEOM providers is relatively unstable - even including a device, that doesn't physically exist, in a kernel can change the provider order. Presumably r362848 also resulted in a change in order. During a root-on-ZFS boot, the kernel scans all providers, looking for ZFS labels with a pool name matching the root pool. Only minimal checks are performed, in particular, there's no check that it's a valid pool, and the first such label found is assumed to describe the root pool. In my case, some time ago, I'd moved things around on my boot disk. My old root pool went to the end of the physical disk but I'd decided to shrink it and left some free space at the end of the disk. This meant that ZFS found one (out of 4) labels when it tasted the physical disk and if GEOM sorted the physical disk prior to its partitions then ZFS would use the pool GUIDs from the stray label on the physical disk and then fail to find a usable pool matching those GUIDs. My fix was to zero the end of my disk. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20200824/4e7f9437/attachment.sig>