Ryan Wehler
2011-Nov-25 03:22 UTC
[zfs-discuss] Recovering from kernel panic / reboot cycle importing pool.
Yesterday morning I awoke to alerts from my SAN that one of my OS disks was faulty, FMA said it was in hardware failure. By the time I got to work (1.5 hours after the email) ALL of my pools were in a degraded state, and "tank" my primary pool had kicked in two hot spares because it was so discombobulated. ------------------- EMAIL ------------------- List of faulty resources: --------------- ------------------------------------ -------------- --------- TIME CACHE-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Nov 23 06:55:28 b42c59c5-c955-c44a-8d61-e0864b61864b DISK-8000-3E Critical Host : calculon Platform : X8DTH-i-6-iF-6F Chassis_id : 1234567890 Product_sn : Fault class : fault.io.scsi.cmd.disk.dev.rqs.derr Affects : dev:///:devid=id1,sd at n5000c500040ec14b//scsi_vhci/disk at g5000c500040ec14b faulted but still in service FRU : "006" (hc://:product-id=LSILOGIC-SASX36-A1----:server-id=:chassis-id=50030480003d38ff:serial=3NM1NN4W00009740ZZUH:part=SEAGATE-ST9146802SS:revision=0002/ses-enclosure=0/bay=6/disk=0) faulty Description : A non-recoverable hardware failure was detected by the device while performing a command. Refer to http://sun.com/msg/DISK-8000-3E for more information. Response : The device may be offlined or degraded. Impact : The device has failed. The service may have been lost or degraded. Action : Ensure that the latest drivers and patches are installed. Schedule a repair procedure to replace the affected device. Use ''fmadm faulty'' to find the affected disk. --------------- ------------------------------------ -------------- --------- TIME CACHE-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Host : calculon Platform : X8DTH-i-6-iF-6F Chassis_id : 1234567890 Product_sn : Fault class : fault.io.scsi.cmd.disk.dev.rqs.merr Affects : dev:///:devid=id1,sd at n5000c500040ec14b//scsi_vhci/disk at g5000c500040ec14b faulted but still in service FRU : "006" (hc://:product-id=LSILOGIC-SASX36-A1----:server-id=:chassis-id=50030480003d38ff:serial=3NM1NN4W00009740ZZUH:part=SEAGATE-ST9146802SS:revision=0002/ses-enclosure=0/bay=6/disk=0) faulty Description : The command was terminated with a non-recovered error condition that may have been caused by a flaw in the media or an error in the recorded data. Refer to http://sun.com/msg/DISK-8000-4Q for more information. Response : The device may be offlined or degraded. Impact : It is likely that continued operation will result in data corruption, which may eventually cause the loss of service or the service degradation. Action : Schedule a repair procedure to replace the affected device. Use ''fmadm faulty'' to find the affected disk. ------------------- EMAIL END ------------------- So, as I said by the time I got to work 1.5 hours had lapsed. I immediately ran zpool detach on the failed disk ID and yanked it from the chassis. I''m baffled as to why FMA called it failed but ZFS decided to keep it online... only when *I* detached it from the array did it ever fall out of the pool! Over the course of the next couple hours my "tank" pool resilvered itself and I set the hot spares back into their normal waiting state and cleared the errors on my two other pools. I then scrubbed tank. This was the output before the scrub: ------------ PRE SCRUB --------------- pool: tank state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Nov 23 09:34:22 2011 41.5G scanned out of 1.34T at 16.1M/s, 23h34m to go 7.00G resilvered, 3.02% done config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 3 raidz2-0 ONLINE 0 0 6 c4t5000C500174132FBd0 ONLINE 0 0 0 (resilvering) c4t5000C50017418D4Fd0 ONLINE 0 0 0 (resilvering) c4t5000C500174190A7d0 ONLINE 0 0 0 (resilvering) c4t5000C5001741B323d0 ONLINE 0 0 0 (resilvering) c4t5000C50017A5872Bd0 ONLINE 0 0 1 (resilvering) c4t5000C50017A5886Bd0 ONLINE 0 0 0 (resilvering) spare-6 ONLINE 0 0 0 c4t5000C50017A5892Fd0 ONLINE 0 0 0 (resilvering) c4t5000C50017A6A637d0 ONLINE 0 0 0 (resilvering) c4t5000C50017A58B13d0 ONLINE 0 0 0 (resilvering) spare-8 ONLINE 0 0 4 c4t5000C50017A58B53d0 ONLINE 0 0 0 (resilvering) c4t5000C50017A69513d0 ONLINE 0 0 0 (resilvering) c4t5000C50017A64DA3d0 ONLINE 0 0 7 (resilvering) c4t5000C50017A66063d0 ONLINE 0 0 3 (resilvering) c4t5000C50017A677F7d0 ONLINE 0 0 0 (resilvering) spares c4t5000C50017A6A637d0 INUSE currently in use c4t5000C50017A69513d0 INUSE currently in use errors: Permanent errors have been detected in the following files: <metadata>:<0x12b> <metadata>:<0x8c> <metadata>:<0x8d> <metadata>:<0xa1> tank/veeam_zvol at snap-weekly-1-2011-11-12-000005:<0x1> ---------- END PRE SCRUB ------------------ After the scrub completed and said it had fixed 20.5MB of data, I restarted the SAN. This is when all heck broke loose. I found myself in an endless kernel panic/reboot loop. I booted with a LiveCD and deleted /etc/zfs/zpool.cache which allowed me to boot back into the system normally. I added set aok=1 and set zfs:zfs_recover=1 in /etc/system and restarted again and then proceeded with: Any attempt to import my pool "tank", including: zpool import -fFX tank zpool import -fFX -o readonly=on tank causes the same kernel panic message: ----- PANIC ---- panic[cpu2]/thread=ffffff00f8578c40: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp), file: ../../common/fs/zfs/dmu.c, line: 796 ffffff00f8578880 genunix:assfail+7e () ffffff00f8578930 zfs:dmu_write+17f () ffffff00f85789e0 zfs:space_map_sync+295 () ffffff00f8578a70 zfs:metaslab_sync+2de () ffffff00f8578ad0 zfs:vdev_sync+d5 () ffffff00f8578b80 zfs:spa_sync+44b () ffffff00f8578c20 zfs:txg_sync_thread+247 () ffffff00f8578c30 unix:thread_start+8 () ----- END PANIC ----- I then ran zdb -e -bcsvL tank and got this output after letting it run all night: ----- SNIP ----- Traversing all blocks to verify checksums ... zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 0> DVA[0]=<0:28b61ec9600:1200> DVA[1]=<0:149e50a5400:1200> DVA[2]=<0:9f83efe800:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous unique triple size=1000L/a00P birth=2561650L/2561650P fill=1 cksum=6e4fd979ea:97c2e68bdf84:7cfea1e3255982:4ca2471aa9f3c833 -- skipping zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 1> DVA[0]=<0:28b61ecba00:1200> DVA[1]=<0:149e50a7800:1200> DVA[2]=<0:9f83f00c00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous unique triple size=1000L/a00P birth=2561650L/2561650P fill=1 cksum=6d08f1acfa:9fa09650c807:88e6b39ae45bde:56a9ac12f2e4c5d9 -- skipping zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 2> DVA[0]=<0:28b61eca800:1200> DVA[1]=<0:149e50a6600:1200> DVA[2]=<0:9f83effa00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous unique triple size=1000L/a00P birth=2561650L/2561650P fill=1 cksum=6f4d50c3e4:aa396085bc4e:95d45fc95bf06e:5fdbfe6eb852cf94 -- skipping zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 3> DVA[0]=<0:28b61eccc00:1200> DVA[1]=<0:149e50a8a00:1200> DVA[2]=<0:9f83f01e00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous unique triple size=1000L/a00P birth=2561650L/2561650P fill=1 cksum=8a0192487d:c4591e633abd:a6e34d65c0e98e:689ca0220fa0f1da -- skipping zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 4> DVA[0]=<0:28b61ecde00:1200> DVA[1]=<0:14cd80adc00:1200> DVA[2]=<0:9f83f03000:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous unique triple size=1000L/a00P birth=2561650L/2561650P fill=1 cksum=7cf66738ef:ac3d97f68b92:8e517e4e827057:577197cf45e041de -- skipping zdb_blkptr_cb: Got error 50 reading <0, 140, 1, 2> DVA[0]=<0:28928362e00:2400> DVA[1]=<0:14d7e17e600:2400> DVA[2]=<0:9f83081000:2400> [L1 SPA space map] fletcher4 lzjb LE contiguous unique triple size=4000L/1a00P birth=2561342L/2561342P fill=101 cksum=25605243405:81f7c665f95d6:11999b168af12c49:6ff515eac0269c25 -- skipping zdb_blkptr_cb: Got error 50 reading <0, 141, 0, 140> DVA[0]=<0:28928331c00:1200> DVA[1]=<0:14a648b4a00:1200> DVA[2]=<0:9f851ee200:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous unique triple size=1000L/a00P birth=2561341L/2561341P fill=1 cksum=aa953ce6a5:e711c4fe79d2:be7981b694239c:7529a161247abc19 -- skipping zdb_blkptr_cb: Got error 50 reading <0, 161, 1, 0> DVA[0]=<0:28928365200:2400> DVA[1]=<0:14d7e180a00:2400> DVA[2]=<0:9f83083400:2400> [L1 SPA space map] fletcher4 lzjb LE contiguous unique triple size=4000L/1a00P birth=2561342L/2561342P fill=101 cksum=23cab002ab7:7e3e1246eb050:114166236be5e14c:11c9b1c1410d0433 -- skipping zdb_blkptr_cb: Got error 50 reading <0, 299, 0, 50a> DVA[0]=<0:170f6f08400:2400> DVA[1]=<0:3eb4386400:2400> DVA[2]=<0:21e22d7ae00:2400> [L0 bpobj] fletcher4 lzjb LE contiguous unique triple size=20000L/1c00P birth=2488469L/2488469P fill=1 cksum=3fc87f4bf36:e2523d83de277:20ef6693f583a573:b33c6fc06386e250 -- skipping zdb_blkptr_cb: Got error 50 reading <0, 299, 0, 28b3> DVA[0]=<0:289283d8400:2400> DVA[1]=<0:14cd80ab800:2400> DVA[2]=<0:9f83097800:2400> [L0 bpobj] fletcher4 lzjb LE contiguous unique triple size=20000L/1a00P birth=2561425L/2561425P fill=1 cksum=37eae7d3f9f:be873fcdaf418:19f6fbd412ee1c8c:545adf8f5512e1c4 -- skipping Error counts: errno count 50 10 block traversal size 1763288064 != alloc 1475123189760 (unreachable 1473359901696) bp count: 77246 bp logical: 4974285824 avg: 64395 bp physical: 435705344 avg: 5640 compression: 11.42 bp allocated: 1763288064 avg: 22826 compression: 2.82 bp deduped: 0 ref>1: 0 deduplication: 1.00 SPA allocated: 1475123189760 used: 41.28% Blocks LSIZE PSIZE ASIZE avg comp %Total Type - - - - - - - unallocated 2 32K 4K 22.5K 11.2K 8.00 0.00 object directory 2 1K 1K 9.00K 4.50K 1.00 0.00 object array 2 32K 3.50K 22.5K 11.2K 9.14 0.00 packed nvlist - - - - - - - packed nvlist size 36.0K 4.47G 287M 1.12G 31.8K 15.92 68.12 bpobj - - - - - - - bpobj header - - - - - - - SPA space map header 39.2K 162M 128M 533M 13.6K 1.27 31.71 SPA space map - - - - - - - ZIL intent log 44 704K 90.5K 482K 10.9K 7.78 0.03 DMU dnode 1 2K 512 4.50K 4.50K 4.00 0.00 DMU objset - - - - - - - DSL directory 12 6.50K 6.00K 54.0K 4.50K 1.08 0.00 DSL directory child map 10 5.50K 5.00K 45.0K 4.50K 1.10 0.00 DSL dataset snap map 75 1.13M 129K 824K 11.0K 8.98 0.05 DSL props - - - - - - - DSL dataset - - - - - - - ZFS znode - - - - - - - ZFS V0 ACL - - - - - - - ZFS plain file - - - - - - - ZFS directory - - - - - - - ZFS master node - - - - - - - ZFS delete queue - - - - - - - zvol object - - - - - - - zvol prop - - - - - - - other uint8[] - - - - - - - other uint64[] - - - - - - - other ZAP 4 64K 7.00K 45.0K 11.2K 9.14 0.00 persistent error log 8 912K 181K 675K 84.4K 5.04 0.04 SPA history - - - - - - - SPA history offsets 1 512 512 4.50K 4.50K 1.00 0.00 Pool properties - - - - - - - DSL permissions - - - - - - - ZFS ACL - - - - - - - ZFS SYSACL - - - - - - - FUID table - - - - - - - FUID table size 1 1K 512 4.50K 4.50K 2.00 0.00 DSL dataset next clones 1 512 512 4.50K 4.50K 1.00 0.00 scan work queue - - - - - - - ZFS user/group used - - - - - - - ZFS user/group quota - - - - - - - snapshot refcount tags - - - - - - - DDT ZAP algorithm 2 32K 3.50K 22.5K 11.2K 9.14 0.00 DDT statistics - - - - - - - System attributes - - - - - - - SA master node - - - - - - - SA attr registration - - - - - - - SA attr layouts - - - - - - - scan translations - - - - - - - deduplicated block 56 31.0K 28.0K 252K 4.50K 1.11 0.01 DSL deadlist map - - - - - - - DSL deadlist map hdr 6 3.50K 3.00K 27.0K 4.50K 1.17 0.00 DSL dir clones 21 2.62M 94.5K 378K 18.0K 28.44 0.02 bpobj subobj - - - - - - - deferred free - - - - - - - dedup ditto 75.4K 4.63G 416M 1.64G 22.3K 11.42 100.00 Total capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum tank 1.34T 1.91T 35 0 196K 0 0 0 14 raidz2 1.34T 1.91T 35 0 196K 0 0 0 84 /dev/dsk/c4t5000C500174132FBd0s0 28 0 19.6K 0 0 0 0 spare 28 0 20.0K 0 0 0 0 /dev/dsk/c4t5000C50017418D4Fd0s0 28 0 20.1K 0 0 0 0 /dev/dsk/c4t5000C50017A6A637d0s0 0 0 54 0 0 0 0 /dev/dsk/c4t5000C500174190A7d0s0 28 0 19.7K 0 0 0 0 /dev/dsk/c4t5000C5001741B323d0s0 28 0 19.7K 0 0 0 0 /dev/dsk/c4t5000C50017A5872Bd0s0 28 0 20.1K 0 0 0 0 /dev/dsk/c4t5000C50017A5886Bd0s0 28 0 19.6K 0 0 0 0 /dev/dsk/c4t5000C50017A5892Fd0s0 28 0 19.6K 0 0 0 0 /dev/dsk/c4t5000C50017A58B13d0s0 28 0 20.1K 0 0 0 0 spare 28 0 19.6K 0 0 0 0 /dev/dsk/c4t5000C50017A58B53d0s0 28 0 19.7K 0 0 0 0 /dev/dsk/c4t5000C50017A69513d0s0 0 0 49 0 0 0 0 /dev/dsk/c4t5000C50017A64DA3d0s0 28 0 19.7K 0 0 0 0 /dev/dsk/c4t5000C50017A66063d0s0 28 0 20.1K 0 0 0 0 /dev/dsk/c4t5000C50017A677F7d0s0 28 0 19.7K 0 0 0 0 root at calculon:/# ----- END SNIP ----- The only thing I''m seeing there, other than the 10 "error 50" is the fact both of my hot spares are listed under the disks they replaced during the degraded operation, rather than where I had put them back into hot-spare mode (zpool detach <lunid> tank) I''m at my wits end here with this. I''ve got a backup that''s from a week ago, but I''ve got a weeks worth of data I cannot re-create or re-pull from the original location the data was generated from. Had all of this happen after a tape replication I would have just blown the pool away, remade it and tested to make sure that the failed disk was the only cause for my ZFS corruption. Any help anyone can give me to salvage my data would be SO SO SO appreciated. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111124/602b7af5/attachment-0001.html>