So I decided to test out failure modes of ZFS root mirrors. Installed on a V240 with nv90. Worked great. Pulled out disk1, then replaced it and attached again, resilvered, all good. Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! Probing system devices Probing memory Probing I/O buses Sun Fire V240, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337. Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39. Rebooting with command: boot Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a File and args: SunOS Release 5.11 Version snv_90 64-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. NOTICE: *************************************************** * This device is not bootable! * * It is either offlined or detached or faulted. * * Please try to boot from a different device. * *************************************************** NOTICE: spa_import_rootpool: error 22 Cannot mount root on /pci at 1c,600000/scsi at 2/disk at 0,0:a fstype zfs panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root 000000000180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800, 12b6000) %l0-3: 000000000001d524 0000000000000064 000000000001d4c0 0000000000001d4c %l4-7: 00000000000005dc 0000000000001770 0000000000000640 00000000018c7000 000000000180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1, 180e000) %l0-3: 0000000001838258 0000000070002000 00000000010bfc00 0000000000000000 %l4-7: 000000000183c400 0000000000000001 000000000180c000 0000000001837c00 skipping system dump - no dump device configured rebooting... This message posted from opensolaris.org
Sounds correct to me. The disk isn''t sync''d so boot should fail. If you pull disk0 or set disk1 as the primary boot device what does it do? You can''t expect it to resliver before booting. On 6/11/08, Vincent Fox <vincent_b_fox at yahoo.com> wrote:> So I decided to test out failure modes of ZFS root mirrors. > > Installed on a V240 with nv90. Worked great. > > Pulled out disk1, then replaced it and attached again, resilvered, all good. > > Now I pull out disk0 to simulate failure there. OS up and running fine, but > lots of error message about SYNC CACHE. > > Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! > > Probing system devices > Probing memory > Probing I/O buses > > Sun Fire V240, No Keyboard > Copyright 2007 Sun Microsystems, Inc. All rights reserved. > OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337. > Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39. > > > > Rebooting with command: boot > Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a File and args: > SunOS Release 5.11 Version snv_90 64-bit > Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. > Use is subject to license terms. > NOTICE: > > *************************************************** > * This device is not bootable! * > * It is either offlined or detached or faulted. * > * Please try to boot from a different device. * > *************************************************** > > > NOTICE: > spa_import_rootpool: error 22 > > Cannot mount root on /pci at 1c,600000/scsi at 2/disk at 0,0:a fstype zfs > > panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root > > 000000000180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800, > 12b6000) > %l0-3: 000000000001d524 0000000000000064 000000000001d4c0 0000000000001d4c > %l4-7: 00000000000005dc 0000000000001770 0000000000000640 00000000018c7000 > 000000000180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1, > 180e000) > %l0-3: 0000000001838258 0000000070002000 00000000010bfc00 0000000000000000 > %l4-7: 000000000183c400 0000000000000001 000000000180c000 0000000001837c00 > > skipping system dump - no dump device configured > rebooting... > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Ummm, could you back up a bit there? What do you mean "disk isn''t sync''d so boot should fail"? I''m coming from UFS of course where I''d expect to be able to fix a damaged boot drive as it drops into a single-user root prompt. I believe I did try boot disk1 but that failed I think due to prior trial with it, where I scrambled it with dd, then resilvered. Then removed it, replaced, resilvered it. Think I ended up with unusable boot sector on disk1 that didn''t work but I didn''t copy the message down sorry. I suppose all that would have been left is boot from media or jumpstart server in single-user and attempt repairs. Unfortunately I have since re-jumpstarted the system clean. This was plain nv90 both times by the way no /etc/system tweaks. I have to pull the motherboard on the V240 and replace it tomorrow, maybe on Friday I will be able to repeat my experiment. Just wanted to run through some failure-modes so I know what to expect when boot drives die on me. Thanks! This message posted from opensolaris.org
Vincent Fox wrote:> So I decided to test out failure modes of ZFS root mirrors. > > Installed on a V240 with nv90. Worked great. > > Pulled out disk1, then replaced it and attached again, resilvered, all good. > > Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. > > Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! >This is actually very good. It means that ZFS recognizes that there are two, out of sync mirrors and you booted from the oldest version. What happens when you change the boot order? -- richard> Probing system devices > Probing memory > Probing I/O buses > > Sun Fire V240, No Keyboard > Copyright 2007 Sun Microsystems, Inc. All rights reserved. > OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337. > Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39. > > > > Rebooting with command: boot > Boot device: /pci at 1c,600000/scsi at 2/disk at 0,0:a File and args: > SunOS Release 5.11 Version snv_90 64-bit > Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. > Use is subject to license terms. > NOTICE: > > *************************************************** > * This device is not bootable! * > * It is either offlined or detached or faulted. * > * Please try to boot from a different device. * > *************************************************** > > > NOTICE: > spa_import_rootpool: error 22 > > Cannot mount root on /pci at 1c,600000/scsi at 2/disk at 0,0:a fstype zfs > > panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root > > 000000000180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800, 12b6000) > %l0-3: 000000000001d524 0000000000000064 000000000001d4c0 0000000000001d4c > %l4-7: 00000000000005dc 0000000000001770 0000000000000640 00000000018c7000 > 000000000180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1, 180e000) > %l0-3: 0000000001838258 0000000070002000 00000000010bfc00 0000000000000000 > %l4-7: 000000000183c400 0000000000000001 000000000180c000 0000000001837c00 > > skipping system dump - no dump device configured > rebooting... > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Vincent Fox wrote:> Ummm, could you back up a bit there? > > What do you mean "disk isn''t sync''d so boot should fail"? I''m coming from UFS of course where I''d expect to be able to fix a damaged boot drive as it drops into a single-user root prompt. > > I believe I did try boot disk1 but that failed I think due to prior trial with it, where I scrambled it with dd, then resilvered. Then removed it, replaced, resilvered it. Think I ended up with unusable boot sector on disk1 that didn''t work but I didn''t copy the message down sorry. > > I suppose all that would have been left is boot from media or jumpstart server in single-user and attempt repairs. Unfortunately I have since re-jumpstarted the system clean. This was plain nv90 both times by the way no /etc/system tweaks. > > I have to pull the motherboard on the V240 and replace it tomorrow, maybe on Friday I will be able to repeat my experiment. Just wanted to run through some failure-modes so I know what to expect when boot drives die on me. >Sequence of events failures are one of the most common fatal errors in complex systems. In this case, you induced a failure mode we call amnesia. It works like this: Consider a system with two (!) mirrored disks (A&B) working normally and in sync. At time0, disconnect disk A. It will still contain a view of the system state, but is not accessible by the system. At time1, the system gives up on disk A and proceeds using disk B. Now the two disks are no longer in sync and the data on disk B is newer than the data on disk A. At time2, shutdown the system. Re-attach disk A. The correct behaviour is that disk A is old and its data should be ignored until repaired. Disk B should be the primary, authoritative view of the system state. This failure mode is called amnesia because disk A doesn''t remember the changes that should have occurred if it had been an active, functional member of the system. AFAIK, SVM will not handle this problem well. ZFS and Solaris Cluster can detect this because the configuration metadata knows the time difference (ZFS can detect this by the latest txg). I predict that if you had booted from disk B, then it would have worked (but I don''t have the hardware setup to test this tonight) NB, for those who don''t know about SPARC boot sequences, the OpenBoot program has a default boot device list and will try the first device, then the second, and so on. This is similar to how most BIOSes work. While you wouldn''t normally expect to need to worry about this, it makes a difference in the case of amnesia. -- richard
On Thu, Jun 12, 2008 at 06:38:56AM +0200, Richard Elling wrote:> Vincent Fox wrote: > > So I decided to test out failure modes of ZFS root mirrors. > > > > Installed on a V240 with nv90. Worked great. > > > > Pulled out disk1, then replaced it and attached again, resilvered, all good. > > > > Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. > > > > Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! > > > > This is actually very good. It means that ZFS recognizes that there > are two, out of sync mirrors and you booted from the oldest version. > What happens when you change the boot order? > -- richardHm, but the steps taken, as I read it, were: pull disk1 replace *resilver* pull disk0 ... So the 2 disks should be in sync (due to resilvering)? Or is there another step needed to get the disks in sync? Kurt
On Wed, Jun 11, 2008 at 10:43:26PM -0700, Richard Elling wrote:> > AFAIK, SVM will not handle this problem well. ZFS and Solaris > Cluster can detect this because the configuration metadata knows > the time difference (ZFS can detect this by the latest txg).Having been through this myself with SVM in the past, no, it does not handle this well at all. If I remember correctly Veritas handled this a lot better than SVM did/does (please bear in mind I haven''t used either of those in quite some time).> I predict that if you had booted from disk B, then it would have > worked (but I don''t have the hardware setup to test this tonight)Unfortunatly I thought of this after deleting his mail. He said that before pulling disk B he scrambled it with dd. He broke the boot sectors on disk B, which ZFS doesn''t replicate as far as I can tell. (See the section on ZFS install where it talks about adding a mirror after the fact, you need to manually install the boot sectors.) IMHO, ZFS boot/root should really go out of its way to make sure the boot sectors are up to date. As most other mirroring solutions (hardware or software) mirror raw volumes, they just do it automatically due to the nature of how they work. This is behavior that has come to be expected, so it''s a really good idea if ZFS could do it. I think something else that might help is if ZFS were to boot, see that the volume it booted from is older than the other one, print a message to that effect and either halt the machine or issue a reboot pointing at the other disk (probably easier with OF than the BIOS of a PC). -brian -- "Coding in C is like sending a 3 year old to do groceries. You gotta tell them exactly what you want or you''ll end up with a cupboard full of pop tarts and pancake mix." -- IRC User (http://www.bash.org/?841435)
On Thu, Jun 12, 2008 at 07:28:23AM -0400, Brian Hechinger wrote:> I think something else that might help is if ZFS were to boot, see that > the volume it booted from is older than the other one, print a message > to that effect and either halt the machine or issue a reboot pointing > at the other disk (probably easier with OF than the BIOS of a PC).That''s the method taken by VxVM. When it finally imports the booting DG, it may find that the root volume isn''t present on the disk that booted. It will stop the boot process at that point. -- Darren
> On Thu, Jun 12, 2008 at 06:38:56AM +0200, Richard > > pull disk1 > replace > *resilver* > pull disk0 > ... > So the 2 disks should be in sync (due to > resilvering)? Or is there > another step needed to get the disks in sync?That is an accurate summary. I thought I was all good with the resilver and in fact ran a scrub and status to be certain of it. If boot sectors do not get installed by default onto disk1, I will have to make this a part of the post-install script for JumpStart. I will re-run this experiment with a clean nv90 install onto a mirror set, and just pull disk0 this time without messing with disk1. This message posted from opensolaris.org
Vincent, I think you are running into some existing bugs, particularly this one: http://bugs.opensolaris.org/view_bug.do?bug_id=6668666 Please review the list of known issues here: http://opensolaris.org/os/community/zfs/boot/ Also check out the issues described on page 77 in this section: Booting From a Alternate Disk in a Mirrored ZFS root Pool http://opensolaris.org/os/community/zfs/docs/ Cindy Vincent Fox wrote:>>On Thu, Jun 12, 2008 at 06:38:56AM +0200, Richard >> >> pull disk1 >> replace >> *resilver* >> pull disk0 >> ... >>So the 2 disks should be in sync (due to >>resilvering)? Or is there >>another step needed to get the disks in sync? > > > That is an accurate summary. I thought I was all good with the resilver and in fact ran a scrub and status to be certain of it. > > If boot sectors do not get installed by default onto disk1, I will have to make this a part of the post-install script for JumpStart. I will re-run this experiment with a clean nv90 install onto a mirror set, and just pull disk0 this time without messing with disk1. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Kurt Schreiner wrote:> On Thu, Jun 12, 2008 at 06:38:56AM +0200, Richard Elling wrote: > >> Vincent Fox wrote: >> >>> So I decided to test out failure modes of ZFS root mirrors. >>> >>> Installed on a V240 with nv90. Worked great. >>> >>> Pulled out disk1, then replaced it and attached again, resilvered, all good. >>> >>> Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. >>> >>> Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! >>> >>> >> This is actually very good. It means that ZFS recognizes that there >> are two, out of sync mirrors and you booted from the oldest version. >> What happens when you change the boot order? >> -- richard >> > Hm, but the steps taken, as I read it, were: > > pull disk1 > replace > *resilver* > pull disk0 > ... > So the 2 disks should be in sync (due to resilvering)? Or is there > another step needed to get the disks in sync? > >The amnesia occurred later: Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! -- richard
On Thu, Jun 12, 2008 at 07:31:49PM +0200, Richard Elling wrote:> Kurt Schreiner wrote: > > On Thu, Jun 12, 2008 at 06:38:56AM +0200, Richard Elling wrote: > > > >> Vincent Fox wrote: > >> > >>> So I decided to test out failure modes of ZFS root mirrors. > >>> > >>> Installed on a V240 with nv90. Worked great. > >>> > >>> Pulled out disk1, then replaced it and attached again, resilvered, all good. > >>> > >>> Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. > >>> > >>> Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! > >>> > >>> > >> This is actually very good. It means that ZFS recognizes that there > >> are two, out of sync mirrors and you booted from the oldest version. > >> What happens when you change the boot order? > >> -- richard > >> > > Hm, but the steps taken, as I read it, were: > > > > pull disk1 > > replace > > *resilver* > > pull disk0 > > ... > > So the 2 disks should be in sync (due to resilvering)? Or is there > > another step needed to get the disks in sync? > > > > > The amnesia occurred later: > Now I pull out disk0 to simulate failure there. OS up and running > fine, but lots of error message about SYNC CACHE. > Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh!Ah! Ok, got it now... Thanks, Kurt
Followup with modified test plan: 1) Yank disk0 from V240. Waited for it to be marked FAULTED in zpool status -x 2) Inserted new disk0 scavenged from another system 3) Ran format to set s0 as full-disk to agree with other system 4) Halted system 5) boot disk1 Wanted to make sure Jumpstart mirror setup DID in fact put the boot blocks on, this worked fine. 6) zpool replace -f rpool c1t0d0s0 Waited for resilvering to finish. The resilvering took only 5 minutes for 6-gigs of OS seems quite a lot faster than SVM! I am used to SVM taking 1-hour plus. 7)installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0 8) init 0 9) boot [b]Uh-oh, got an error here: Boot device: disk File and args: ERROR: /packages/ufs-file-system: Last Trap: Trap 3d[/b] 10) boot disk1 Everything seems fine there. Not sure what is wrong with the installboot Fantastic work! This message posted from opensolaris.org
You want to install the zfs boot block, not the ufs bootblock. Check the syntax in the ZFS Admin Guide that is available from this location: http://opensolaris.org/os/community/zfs/docs Cindy ----- Original Message ----- From: Vincent Fox <vincent_b_fox at yahoo.com> Date: Friday, June 13, 2008 3:49 pm Subject: Re: [zfs-discuss] ZFS root boot failure? To: zfs-discuss at opensolaris.org> Followup with modified test plan: > > 1) Yank disk0 from V240. > Waited for it to be marked FAULTED in zpool status -x > 2) Inserted new disk0 scavenged from another system > 3) Ran format to set s0 as full-disk to agree with other system > 4) Halted system > 5) boot disk1 > Wanted to make sure Jumpstart mirror setup DID in fact put > the boot blocks on, this worked fine. > 6) zpool replace -f rpool c1t0d0s0 > Waited for resilvering to finish. The resilvering took only 5 minutes > for 6-gigs of OS seems quite a lot faster than SVM! I am used to SVM > taking 1-hour plus. > 7)installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0 > 8) init 0 > 9) boot > [b]Uh-oh, got an error here: > Boot device: disk File and args: > ERROR: /packages/ufs-file-system: Last Trap: Trap 3d[/b] > 10) boot disk1 > Everything seems fine there. > > Not sure what is wrong with the installboot > > Fantastic work! > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> You want to install the zfs boot block, not the ufs > bootblock.Oh duh. I tried to correct my mistake using this: installboot /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t0d0s0 And now get this: Boot device: disk File and args: Can''t mount root Evaluating: The file just loaded does not appear to be executable. I''ll poke around in the docs and try again. This message posted from opensolaris.org