Ryan Wehler
2011-Nov-25 03:22 UTC
[zfs-discuss] Recovering from kernel panic / reboot cycle importing pool.
Yesterday morning I awoke to alerts from my SAN that one of my OS disks was
faulty, FMA said it was in hardware failure. By the time I got to work (1.5
hours after the email) ALL of my pools were in a degraded state, and
"tank" my primary pool had kicked in two hot spares because it was so
discombobulated.
------------------- EMAIL -------------------
List of faulty resources:
--------------- ------------------------------------ -------------- ---------
TIME CACHE-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Nov 23 06:55:28 b42c59c5-c955-c44a-8d61-e0864b61864b DISK-8000-3E Critical
Host : calculon
Platform : X8DTH-i-6-iF-6F Chassis_id : 1234567890
Product_sn :
Fault class : fault.io.scsi.cmd.disk.dev.rqs.derr
Affects : dev:///:devid=id1,sd at n5000c500040ec14b//scsi_vhci/disk at
g5000c500040ec14b
faulted but still in service
FRU : "006"
(hc://:product-id=LSILOGIC-SASX36-A1----:server-id=:chassis-id=50030480003d38ff:serial=3NM1NN4W00009740ZZUH:part=SEAGATE-ST9146802SS:revision=0002/ses-enclosure=0/bay=6/disk=0)
faulty
Description : A non-recoverable hardware failure was detected by the device
while performing a command.
Refer to http://sun.com/msg/DISK-8000-3E for more
information.
Response : The device may be offlined or degraded.
Impact : The device has failed. The service may have been lost or
degraded.
Action : Ensure that the latest drivers and patches are installed.
Schedule a repair procedure to replace the affected
device. Use ''fmadm faulty'' to find the affected
disk.
--------------- ------------------------------------ -------------- ---------
TIME CACHE-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Host : calculon
Platform : X8DTH-i-6-iF-6F Chassis_id : 1234567890
Product_sn :
Fault class : fault.io.scsi.cmd.disk.dev.rqs.merr
Affects : dev:///:devid=id1,sd at n5000c500040ec14b//scsi_vhci/disk at
g5000c500040ec14b
faulted but still in service
FRU : "006"
(hc://:product-id=LSILOGIC-SASX36-A1----:server-id=:chassis-id=50030480003d38ff:serial=3NM1NN4W00009740ZZUH:part=SEAGATE-ST9146802SS:revision=0002/ses-enclosure=0/bay=6/disk=0)
faulty
Description : The command was terminated with a non-recovered error condition
that may have been caused by a flaw in the media or an error in
the recorded data.
Refer to http://sun.com/msg/DISK-8000-4Q for more
information.
Response : The device may be offlined or degraded.
Impact : It is likely that continued operation will result in data
corruption, which may eventually cause the loss of service or the
service degradation.
Action : Schedule a repair procedure to replace the affected device. Use
''fmadm faulty'' to find the affected disk.
------------------- EMAIL END -------------------
So, as I said by the time I got to work 1.5 hours had lapsed. I immediately ran
zpool detach on the failed disk ID and yanked it from the chassis.
I''m baffled as to why FMA called it failed but ZFS decided to keep it
online... only when *I* detached it from the array did it ever fall out of the
pool! Over the course of the next couple hours my "tank" pool
resilvered itself and I set the hot spares back into their normal waiting state
and cleared the errors on my two other pools. I then scrubbed tank.
This was the output before the scrub:
------------ PRE SCRUB ---------------
pool: tank
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Nov 23 09:34:22 2011
41.5G scanned out of 1.34T at 16.1M/s, 23h34m to go
7.00G resilvered, 3.02% done
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 3
raidz2-0 ONLINE 0 0 6
c4t5000C500174132FBd0 ONLINE 0 0 0 (resilvering)
c4t5000C50017418D4Fd0 ONLINE 0 0 0 (resilvering)
c4t5000C500174190A7d0 ONLINE 0 0 0 (resilvering)
c4t5000C5001741B323d0 ONLINE 0 0 0 (resilvering)
c4t5000C50017A5872Bd0 ONLINE 0 0 1 (resilvering)
c4t5000C50017A5886Bd0 ONLINE 0 0 0 (resilvering)
spare-6 ONLINE 0 0 0
c4t5000C50017A5892Fd0 ONLINE 0 0 0 (resilvering)
c4t5000C50017A6A637d0 ONLINE 0 0 0 (resilvering)
c4t5000C50017A58B13d0 ONLINE 0 0 0 (resilvering)
spare-8 ONLINE 0 0 4
c4t5000C50017A58B53d0 ONLINE 0 0 0 (resilvering)
c4t5000C50017A69513d0 ONLINE 0 0 0 (resilvering)
c4t5000C50017A64DA3d0 ONLINE 0 0 7 (resilvering)
c4t5000C50017A66063d0 ONLINE 0 0 3 (resilvering)
c4t5000C50017A677F7d0 ONLINE 0 0 0 (resilvering)
spares
c4t5000C50017A6A637d0 INUSE currently in use
c4t5000C50017A69513d0 INUSE currently in use
errors: Permanent errors have been detected in the following files:
<metadata>:<0x12b>
<metadata>:<0x8c>
<metadata>:<0x8d>
<metadata>:<0xa1>
tank/veeam_zvol at snap-weekly-1-2011-11-12-000005:<0x1>
---------- END PRE SCRUB ------------------
After the scrub completed and said it had fixed 20.5MB of data, I restarted the
SAN. This is when all heck broke loose. I found myself in an endless kernel
panic/reboot loop.
I booted with a LiveCD and deleted /etc/zfs/zpool.cache which allowed me to boot
back into the system normally. I added set aok=1 and set zfs:zfs_recover=1 in
/etc/system and restarted again and then proceeded with:
Any attempt to import my pool "tank", including:
zpool import -fFX tank
zpool import -fFX -o readonly=on tank
causes the same kernel panic message:
----- PANIC ----
panic[cpu2]/thread=ffffff00f8578c40:
assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG,
&numbufs, &dbp), file:
../../common/fs/zfs/dmu.c, line: 796
ffffff00f8578880 genunix:assfail+7e ()
ffffff00f8578930 zfs:dmu_write+17f ()
ffffff00f85789e0 zfs:space_map_sync+295 ()
ffffff00f8578a70 zfs:metaslab_sync+2de ()
ffffff00f8578ad0 zfs:vdev_sync+d5 ()
ffffff00f8578b80 zfs:spa_sync+44b ()
ffffff00f8578c20 zfs:txg_sync_thread+247 ()
ffffff00f8578c30 unix:thread_start+8 ()
----- END PANIC -----
I then ran zdb -e -bcsvL tank and got this output after letting it run all
night:
----- SNIP -----
Traversing all blocks to verify checksums ...
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 0>
DVA[0]=<0:28b61ec9600:1200> DVA[1]=<0:149e50a5400:1200>
DVA[2]=<0:9f83efe800:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=6e4fd979ea:97c2e68bdf84:7cfea1e3255982:4ca2471aa9f3c833 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 1>
DVA[0]=<0:28b61ecba00:1200> DVA[1]=<0:149e50a7800:1200>
DVA[2]=<0:9f83f00c00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=6d08f1acfa:9fa09650c807:88e6b39ae45bde:56a9ac12f2e4c5d9 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 2>
DVA[0]=<0:28b61eca800:1200> DVA[1]=<0:149e50a6600:1200>
DVA[2]=<0:9f83effa00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=6f4d50c3e4:aa396085bc4e:95d45fc95bf06e:5fdbfe6eb852cf94 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 3>
DVA[0]=<0:28b61eccc00:1200> DVA[1]=<0:149e50a8a00:1200>
DVA[2]=<0:9f83f01e00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=8a0192487d:c4591e633abd:a6e34d65c0e98e:689ca0220fa0f1da -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 4>
DVA[0]=<0:28b61ecde00:1200> DVA[1]=<0:14cd80adc00:1200>
DVA[2]=<0:9f83f03000:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=7cf66738ef:ac3d97f68b92:8e517e4e827057:577197cf45e041de -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 140, 1, 2>
DVA[0]=<0:28928362e00:2400> DVA[1]=<0:14d7e17e600:2400>
DVA[2]=<0:9f83081000:2400> [L1 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=4000L/1a00P birth=2561342L/2561342P fill=101
cksum=25605243405:81f7c665f95d6:11999b168af12c49:6ff515eac0269c25 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 141, 0, 140>
DVA[0]=<0:28928331c00:1200> DVA[1]=<0:14a648b4a00:1200>
DVA[2]=<0:9f851ee200:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561341L/2561341P fill=1
cksum=aa953ce6a5:e711c4fe79d2:be7981b694239c:7529a161247abc19 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 161, 1, 0>
DVA[0]=<0:28928365200:2400> DVA[1]=<0:14d7e180a00:2400>
DVA[2]=<0:9f83083400:2400> [L1 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=4000L/1a00P birth=2561342L/2561342P fill=101
cksum=23cab002ab7:7e3e1246eb050:114166236be5e14c:11c9b1c1410d0433 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 299, 0, 50a>
DVA[0]=<0:170f6f08400:2400> DVA[1]=<0:3eb4386400:2400>
DVA[2]=<0:21e22d7ae00:2400> [L0 bpobj] fletcher4 lzjb LE contiguous unique
triple size=20000L/1c00P birth=2488469L/2488469P fill=1
cksum=3fc87f4bf36:e2523d83de277:20ef6693f583a573:b33c6fc06386e250 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 299, 0, 28b3>
DVA[0]=<0:289283d8400:2400> DVA[1]=<0:14cd80ab800:2400>
DVA[2]=<0:9f83097800:2400> [L0 bpobj] fletcher4 lzjb LE contiguous unique
triple size=20000L/1a00P birth=2561425L/2561425P fill=1
cksum=37eae7d3f9f:be873fcdaf418:19f6fbd412ee1c8c:545adf8f5512e1c4 -- skipping
Error counts:
errno count
50 10
block traversal size 1763288064 != alloc 1475123189760 (unreachable
1473359901696)
bp count: 77246
bp logical: 4974285824 avg: 64395
bp physical: 435705344 avg: 5640 compression: 11.42
bp allocated: 1763288064 avg: 22826 compression: 2.82
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 1475123189760 used: 41.28%
Blocks LSIZE PSIZE ASIZE avg comp %Total Type
- - - - - - - unallocated
2 32K 4K 22.5K 11.2K 8.00 0.00 object directory
2 1K 1K 9.00K 4.50K 1.00 0.00 object array
2 32K 3.50K 22.5K 11.2K 9.14 0.00 packed nvlist
- - - - - - - packed nvlist size
36.0K 4.47G 287M 1.12G 31.8K 15.92 68.12 bpobj
- - - - - - - bpobj header
- - - - - - - SPA space map header
39.2K 162M 128M 533M 13.6K 1.27 31.71 SPA space map
- - - - - - - ZIL intent log
44 704K 90.5K 482K 10.9K 7.78 0.03 DMU dnode
1 2K 512 4.50K 4.50K 4.00 0.00 DMU objset
- - - - - - - DSL directory
12 6.50K 6.00K 54.0K 4.50K 1.08 0.00 DSL directory child map
10 5.50K 5.00K 45.0K 4.50K 1.10 0.00 DSL dataset snap map
75 1.13M 129K 824K 11.0K 8.98 0.05 DSL props
- - - - - - - DSL dataset
- - - - - - - ZFS znode
- - - - - - - ZFS V0 ACL
- - - - - - - ZFS plain file
- - - - - - - ZFS directory
- - - - - - - ZFS master node
- - - - - - - ZFS delete queue
- - - - - - - zvol object
- - - - - - - zvol prop
- - - - - - - other uint8[]
- - - - - - - other uint64[]
- - - - - - - other ZAP
4 64K 7.00K 45.0K 11.2K 9.14 0.00 persistent error log
8 912K 181K 675K 84.4K 5.04 0.04 SPA history
- - - - - - - SPA history offsets
1 512 512 4.50K 4.50K 1.00 0.00 Pool properties
- - - - - - - DSL permissions
- - - - - - - ZFS ACL
- - - - - - - ZFS SYSACL
- - - - - - - FUID table
- - - - - - - FUID table size
1 1K 512 4.50K 4.50K 2.00 0.00 DSL dataset next clones
1 512 512 4.50K 4.50K 1.00 0.00 scan work queue
- - - - - - - ZFS user/group used
- - - - - - - ZFS user/group quota
- - - - - - - snapshot refcount tags
- - - - - - - DDT ZAP algorithm
2 32K 3.50K 22.5K 11.2K 9.14 0.00 DDT statistics
- - - - - - - System attributes
- - - - - - - SA master node
- - - - - - - SA attr registration
- - - - - - - SA attr layouts
- - - - - - - scan translations
- - - - - - - deduplicated block
56 31.0K 28.0K 252K 4.50K 1.11 0.01 DSL deadlist map
- - - - - - - DSL deadlist map hdr
6 3.50K 3.00K 27.0K 4.50K 1.17 0.00 DSL dir clones
21 2.62M 94.5K 378K 18.0K 28.44 0.02 bpobj subobj
- - - - - - - deferred free
- - - - - - - dedup ditto
75.4K 4.63G 416M 1.64G 22.3K 11.42 100.00 Total
capacity operations bandwidth ---- errors ----
description used avail read write read write read write cksum
tank 1.34T 1.91T 35 0 196K 0 0 0 14
raidz2 1.34T 1.91T 35 0 196K 0 0 0 84
/dev/dsk/c4t5000C500174132FBd0s0 28 0 19.6K 0 0 0 0
spare 28 0 20.0K 0 0 0 0
/dev/dsk/c4t5000C50017418D4Fd0s0 28 0 20.1K 0 0 0 0
/dev/dsk/c4t5000C50017A6A637d0s0 0 0 54 0 0 0 0
/dev/dsk/c4t5000C500174190A7d0s0 28 0 19.7K 0 0 0 0
/dev/dsk/c4t5000C5001741B323d0s0 28 0 19.7K 0 0 0 0
/dev/dsk/c4t5000C50017A5872Bd0s0 28 0 20.1K 0 0 0 0
/dev/dsk/c4t5000C50017A5886Bd0s0 28 0 19.6K 0 0 0 0
/dev/dsk/c4t5000C50017A5892Fd0s0 28 0 19.6K 0 0 0 0
/dev/dsk/c4t5000C50017A58B13d0s0 28 0 20.1K 0 0 0 0
spare 28 0 19.6K 0 0 0 0
/dev/dsk/c4t5000C50017A58B53d0s0 28 0 19.7K 0 0 0 0
/dev/dsk/c4t5000C50017A69513d0s0 0 0 49 0 0 0 0
/dev/dsk/c4t5000C50017A64DA3d0s0 28 0 19.7K 0 0 0 0
/dev/dsk/c4t5000C50017A66063d0s0 28 0 20.1K 0 0 0 0
/dev/dsk/c4t5000C50017A677F7d0s0 28 0 19.7K 0 0 0 0
root at calculon:/#
----- END SNIP -----
The only thing I''m seeing there, other than the 10 "error 50"
is the fact both of my hot spares are listed under the disks they replaced
during the degraded operation, rather than where I had put them back into
hot-spare mode (zpool detach <lunid> tank)
I''m at my wits end here with this. I''ve got a backup
that''s from a week ago, but I''ve got a weeks worth of data I
cannot re-create or re-pull from the original location the data was generated
from. Had all of this happen after a tape replication I would have just blown
the pool away, remade it and tested to make sure that the failed disk was the
only cause for my ZFS corruption.
Any help anyone can give me to salvage my data would be SO SO SO appreciated.
Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111124/602b7af5/attachment-0001.html>