Artem Belevich
2013-Nov-03 03:22 UTC
Can''t mount root from raidz2 after r255763 in stable/9
Hi, I have a box with root mounted from 8-disk raidz2 ZFS volume. After recent buildworld I''ve ran into an issue that kernel fails to mount root with error 6. r255763 on stable/9 is the first revision that fails to mount root on mybox. Preceding r255749 boots fine. Commit r255763 (http://svnweb.freebsd.org/base?view=revision&revision=255763) MFCs bunch of changes from 10 but I don''t see anything that obviously impacts ZFS. Attempting to boot with vfs.zfs.debug=1 shows that order in which geom providers are probed by zfs has apparently changed. Kernels that boot, show "guid match for provider /dev/gpt/<valid pool slice>" while failing kernels show "guid match for provider /dev/daX" -- the raw disks that are *not* the right geom provider for my pool slices. Beats me why ZFS picks raw disks over GPT partitions it should have. Pool configuration: #zpool status z0 pool: z0 state: ONLINE scan: scrub repaired 0 in 8h57m with 0 errors on Sat Oct 19 20:23:52 2013 config: NAME STATE READ WRITE CKSUM z0 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/da0p4-z0 ONLINE 0 0 0 gpt/da1p4-z0 ONLINE 0 0 0 gpt/da2p4-z0 ONLINE 0 0 0 gpt/da3p4-z0 ONLINE 0 0 0 gpt/da4p4-z0 ONLINE 0 0 0 gpt/da5p4-z0 ONLINE 0 0 0 gpt/da6p4-z0 ONLINE 0 0 0 gpt/da7p4-z0 ONLINE 0 0 0 logs mirror-1 ONLINE 0 0 0 gpt/ssd-zil-z0 ONLINE 0 0 0 gpt/ssd1-zil-z0 ONLINE 0 0 0 cache gpt/ssd1-l2arc-z0 ONLINE 0 0 0 errors: No known data errors Here are screen captures from a failed boot: https://plus.google.com/photos/+ArtemBelevich/albums/5941857781891332785 And here''s boot log from successful boot on the same system: http://pastebin.com/XCwebsh7 Removing ZIL and L2ARC makes no difference -- r255763 still fails to mount root. I''m thoroughly baffled. Is there''s something wrong with the pool -- some junk metadata somewhere on the disk that now screws with the root mounting? Changed order in geom provider enumeration? Something else? Any suggestions on what I can do to debug this further? --Artem _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
Andriy Gapon
2013-Nov-03 08:02 UTC
Re: Can''t mount root from raidz2 after r255763 in stable/9
on 03/11/2013 05:22 Artem Belevich said the following:> Hi, > > I have a box with root mounted from 8-disk raidz2 ZFS volume. > After recent buildworld I''ve ran into an issue that kernel fails to > mount root with error 6. > r255763 on stable/9 is the first revision that fails to mount root on > mybox. Preceding r255749 boots fine. > > Commit r255763 (http://svnweb.freebsd.org/base?view=revision&revision=255763) > MFCs bunch of changes from 10 but I don''t see anything that obviously > impacts ZFS.Indeed.> Attempting to boot with vfs.zfs.debug=1 shows that order in which geom > providers are probed by zfs has apparently changed. Kernels that boot, > show "guid match for provider /dev/gpt/<valid pool slice>" while > failing kernels show "guid match for provider /dev/daX" -- the raw > disks that are *not* the right geom provider for my pool slices. Beats > me why ZFS picks raw disks over GPT partitions it should have.Perhaps the kernel gpart code fails to recognize the partitions and thus ZFS can''t see them?> Pool configuration: > #zpool status z0 > pool: z0 > state: ONLINE > scan: scrub repaired 0 in 8h57m with 0 errors on Sat Oct 19 20:23:52 2013 > config: > > NAME STATE READ WRITE CKSUM > z0 ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > gpt/da0p4-z0 ONLINE 0 0 0 > gpt/da1p4-z0 ONLINE 0 0 0 > gpt/da2p4-z0 ONLINE 0 0 0 > gpt/da3p4-z0 ONLINE 0 0 0 > gpt/da4p4-z0 ONLINE 0 0 0 > gpt/da5p4-z0 ONLINE 0 0 0 > gpt/da6p4-z0 ONLINE 0 0 0 > gpt/da7p4-z0 ONLINE 0 0 0 > logs > mirror-1 ONLINE 0 0 0 > gpt/ssd-zil-z0 ONLINE 0 0 0 > gpt/ssd1-zil-z0 ONLINE 0 0 0 > cache > gpt/ssd1-l2arc-z0 ONLINE 0 0 0 > > errors: No known data errors > > Here are screen captures from a failed boot: > https://plus.google.com/photos/+ArtemBelevich/albums/5941857781891332785I don''t have permission to view this album.> And here''s boot log from successful boot on the same system: > http://pastebin.com/XCwebsh7 > > Removing ZIL and L2ARC makes no difference -- r255763 still fails to mount root. > > I''m thoroughly baffled. Is there''s something wrong with the pool -- > some junk metadata somewhere on the disk that now screws with the root > mounting? Changed order in geom provider enumeration? Something else? > Any suggestions on what I can do to debug this further?gpart. -- Andriy Gapon _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
on 03/11/2013 05:22 Artem Belevich said the following:> Hi, > > I have a box with root mounted from 8-disk raidz2 ZFS volume. > After recent buildworld I've ran into an issue that kernel fails to > mount root with error 6. > r255763 on stable/9 is the first revision that fails to mount root on > mybox. Preceding r255749 boots fine. > > Commit r255763 (http://svnweb.freebsd.org/base?view=revision&revision=255763) > MFCs bunch of changes from 10 but I don't see anything that obviously > impacts ZFS.Indeed.> Attempting to boot with vfs.zfs.debug=1 shows that order in which geom > providers are probed by zfs has apparently changed. Kernels that boot, > show "guid match for provider /dev/gpt/<valid pool slice>" while > failing kernels show "guid match for provider /dev/daX" -- the raw > disks that are *not* the right geom provider for my pool slices. Beats > me why ZFS picks raw disks over GPT partitions it should have.Perhaps the kernel gpart code fails to recognize the partitions and thus ZFS can't see them?> Pool configuration: > #zpool status z0 > pool: z0 > state: ONLINE > scan: scrub repaired 0 in 8h57m with 0 errors on Sat Oct 19 20:23:52 2013 > config: > > NAME STATE READ WRITE CKSUM > z0 ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > gpt/da0p4-z0 ONLINE 0 0 0 > gpt/da1p4-z0 ONLINE 0 0 0 > gpt/da2p4-z0 ONLINE 0 0 0 > gpt/da3p4-z0 ONLINE 0 0 0 > gpt/da4p4-z0 ONLINE 0 0 0 > gpt/da5p4-z0 ONLINE 0 0 0 > gpt/da6p4-z0 ONLINE 0 0 0 > gpt/da7p4-z0 ONLINE 0 0 0 > logs > mirror-1 ONLINE 0 0 0 > gpt/ssd-zil-z0 ONLINE 0 0 0 > gpt/ssd1-zil-z0 ONLINE 0 0 0 > cache > gpt/ssd1-l2arc-z0 ONLINE 0 0 0 > > errors: No known data errors > > Here are screen captures from a failed boot: > https://plus.google.com/photos/+ArtemBelevich/albums/5941857781891332785I don't have permission to view this album.> And here's boot log from successful boot on the same system: > http://pastebin.com/XCwebsh7 > > Removing ZIL and L2ARC makes no difference -- r255763 still fails to mount root. > > I'm thoroughly baffled. Is there's something wrong with the pool -- > some junk metadata somewhere on the disk that now screws with the root > mounting? Changed order in geom provider enumeration? Something else? > Any suggestions on what I can do to debug this further?gpart. -- Andriy Gapon