Platform: - old dell workstation with an Andataco gigaraid enclosure plugged into an Adaptec 39160 - Nevada b51 Current zpool config: - one two-disk mirror with two hot spares In my ferocious pounding of ZFS I''ve managed to corrupt my data pool. This is what I''ve been doing to test it: - set zil_disable to 1 in /etc/system - continually untar a couple of files into the filesystem - manually spin down a drive in the mirror by holding down the button on the enclosure - for any system hangs reboot with a nasty reboot -dnq I''ve gotten different results after the spindown: - works properly: short or no hang, hot spare successfully added to the mirror - system hangs, and after a reboot the spare is not added - tar hangs, but after running "zpool status" the hot spare is added properly and tar continues - tar continues, but hangs on "zpool status" The last is what happened just prior to the corruption. Here''s the output of zpool status: nextest-01# zpool status -v pool: zmir state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed with 1 errors on Thu Nov 30 11:37:21 2006 config: NAME STATE READ WRITE CKSUM zmir DEGRADED 8 0 4 mirror DEGRADED 8 0 4 c3t3d0 ONLINE 0 0 24 c3t4d0 UNAVAIL 0 0 0 cannot open spares c0t0d0 AVAIL c3t1d0 AVAIL errors: The following persistent errors have been detected: DATASET OBJECT RANGE 15 0 lvl=4294967295 blkid=0 So the questions are: - is this fixable? I don''t see an inum I could run find on to remove, and I can''t even do a zfs volinit anyway: nextest-01# zfs volinit cannot iterate filesystems: I/O error - would not enabling zil_disable have prevented this? - Should I have been doing a 3-way mirror? - Is there a more optimum configuration to help prevent this kind of corruption? Ultimately, I want to build a ZFS server with performance and reliability comparable to say, a Netapp, but the fact that I appear to have been able to nuke my pool by simulating a hardware error gives me pause. I''d love to know if I''m off-base in my worries. Jim This message posted from opensolaris.org
> So the questions are: > > - is this fixable? I don''t see an inum I could run > find on to remove, > and I can''t even do a zfs volinit anyway: > nextest-01# zfs volinit > cannot iterate filesystems: I/O error > > - would not enabling zil_disable have prevented > this? > > - Should I have been doing a 3-way mirror? > - Is there a more optimum configuration to help > prevent this kind of corruption?Anyone have any thoughts on this? I''d really like to be able to build a nice ZFS box for file service but if a hardware failure can corrupt a disk pool I''ll have to try to find another solution, I''m afraid. This message posted from opensolaris.org
> Anyone have any thoughts on this? I''d really like to > be able to build a nice ZFS box for file service but if > a hardware failure can corrupt a disk pool I''ll have to > try to find another solution, I''m afraid.Sorry, I worded this poorly -- if the loss of a disk in a mirror can corrupt the pool it''s going to give me pause in implementing a ZFS solution. Jim This message posted from opensolaris.org
Jim, I''m not at all sure what happened to your pool. However, I can answer some of your questions. Jim Hranicky wrote On 12/05/06 11:32,:>>So the questions are: >> >>- is this fixable? I don''t see an inum I could run >> find on to remove,I think the pool is busted. Even the message printed in your previous email is bad: DATASET OBJECT RANGE 15 0 lvl=4294967295 blkid=0 as level is way out of range.>> and I can''t even do a zfs volinit anyway: >> nextest-01# zfs volinit >> cannot iterate filesystems: I/O errorI''m not sure why you''re using zfs volinit which I believe creates the zvol links, but this further shows problems.>> >>- would not enabling zil_disable have prevented this?No the intent log is not needed for pool integrity. It ensures the synchronous semantics of O_DSYNC/fsync are obeyed.
> I think the pool is busted. Even the message printed in your > previous email is bad: > > DATASET OBJECT RANGE > 15 0 lvl=4294967295 blkid=0 > > as level is way out of range.I think this could be from dmu_objset_open_impl(). It sets object to 0 and level to -1 (= 4294967295). [Hmmm, this also seems to indicate a truncation from 64 to 32 bits somewhere.] Would zdb would show any more detail? (Actually, it looks like the ZIL also sets object to 0 and level to -1 when accessing its blocks, but since the ZIL was disabled, I''d guess this isn''t the issue here.) This message posted from opensolaris.org
Here''s the output of zdb: zmir version=3 name=''zmir'' state=0 txg=770 pool_guid=5904723747772934703 vdev_tree type=''root'' id=0 guid=5904723747772934703 children[0] type=''mirror'' id=0 guid=15067187713781123481 metaslab_array=15 metaslab_shift=28 ashift=9 asize=36690722816 children[0] type=''disk'' id=0 guid=8544021753105415508 path=''/dev/dsk/c3t3d0s0'' devid=''id1,sd at x00609487b409636e/a'' whole_disk=1 is_spare=1 DTL=19 children[1] type=''disk'' id=1 guid=3579059219373561470 path=''/dev/dsk/c3t4d0s0'' devid=''id1,sd at n5005076710cff8b5/a'' whole_disk=1 is_spare=1 DTL=20 It doesn''t seem to give much information, and I don''t know any of the "secret options" :-> Can anyone at all give me a good reason why this did happen, or give me any options to zdb so I can find out? I can try plugging the spun-down disk back in and seeing if it can recover, although that''s not going to be an option if this happens for real... Jim This message posted from opensolaris.org
Hi Jim, That looks interesting though, I''m not a zfs expert by any means but look at some of the properties of the children elements of the mirror:- version=3 name=''zmir'' state=0 txg=770 pool_guid=5904723747772934703 vdev_tree type=''root'' id=0 guid=5904723747772934703 children[0] type=''mirror'' id=0 guid=15067187713781123481 metaslab_array=15 metaslab_shift=28 ashift=9 asize=36690722816 children[0] type=''disk'' id=0 guid=8544021753105415508 [b]path=''/dev/dsk/c3t3d0s0''[/b] devid=''id1,sd at x00609487b409636e/a'' whole_disk=1 [b]is_spare=1[/b] DTL=19 children[1] type=''disk'' id=1 guid=3579059219373561470 [b]path=''/dev/dsk/c3t4d0s0''[/b] devid=''id1,sd at n5005076710cff8b5/a'' whole_disk=1 [b]is_spare=1[/b] DTL=20 If those are the original path ids, and you didn''t move the disks on the bus? Why is the is_spare flag set? There are a lot of options to zdb, some can produce a lot of output. Try zdb zmir Check the drive label contents with zdb -l /dev/dsk/c3t0d0s0 zdb -l /dev/dsk/c3t1d0s0 zdb -l /dev/dsk/c3t3d0s0 zdb -l /dev/dsk/c3t4d0s0 Uberblock info with zdb -uuu zmir And dataset info with zdb -dd zmir There are more options, and they give even more info if you repeat the option letter more times ( especially the -d flag... ) These might be worth posting to help one of the developers spot something. Cheers, Alan This message posted from opensolaris.org
> If those are the original path ids, and you didn''t > move the disks on the bus? Why is the is_spare flagWell, I''m not sure, but these drives were set as spares in another pool I deleted -- should I have done something to the drives (fdisk?) before rearranging it? The rest of the options are spitting out a bunch of stuff I''ll be glad to post links too, but if the problem is that the drives are erroneously marked as spares I''ll re-init them and start over. Jim This message posted from opensolaris.org
Hold fire on the re-init until one of the devs chips in, maybe I''m barking up the wrong tree ;) --a This message posted from opensolaris.org
On Wed, Dec 06, 2006 at 12:35:58PM -0800, Jim Hranicky wrote:> > If those are the original path ids, and you didn''t > > move the disks on the bus? Why is the is_spare flag > > Well, I''m not sure, but these drives were set as spares in another pool > I deleted -- should I have done something to the drives (fdisk?) before > rearranging it? > > The rest of the options are spitting out a bunch of stuff I''ll be > glad to post links too, but if the problem is that the drives are > erroneously marked as spares I''ll re-init them and start over.There are known issues with the way spares are tracked and recorded on disk that can result in a variety of strange behavior in exceptional circumstances. We are working on resolving these issues. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock