Ryan Wehler
2011-Nov-25  03:22 UTC
[zfs-discuss] Recovering from kernel panic / reboot cycle importing pool.
Yesterday morning I awoke to alerts from my SAN that one of my OS disks was
faulty, FMA said it was in hardware failure.  By the time I got to work (1.5
hours after the email) ALL of my pools were in a degraded state, and
"tank" my primary pool had kicked in two hot spares because it was so
discombobulated.
------------------- EMAIL ------------------- 
List of faulty resources:
--------------- ------------------------------------  -------------- ---------
TIME            CACHE-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Nov 23 06:55:28 b42c59c5-c955-c44a-8d61-e0864b61864b  DISK-8000-3E   Critical 
 
Host        : calculon
Platform    : X8DTH-i-6-iF-6F Chassis_id  : 1234567890
Product_sn  :
 
Fault class : fault.io.scsi.cmd.disk.dev.rqs.derr
Affects     : dev:///:devid=id1,sd at n5000c500040ec14b//scsi_vhci/disk at
g5000c500040ec14b
                  faulted but still in service
FRU         : "006"
(hc://:product-id=LSILOGIC-SASX36-A1----:server-id=:chassis-id=50030480003d38ff:serial=3NM1NN4W00009740ZZUH:part=SEAGATE-ST9146802SS:revision=0002/ses-enclosure=0/bay=6/disk=0)
                  faulty
 
Description : A non-recoverable hardware failure was detected by the device
              while performing a command.
              Refer to http://sun.com/msg/DISK-8000-3E for more
              information.
 
Response    : The device may be offlined or degraded.
 
Impact      : The device has failed. The service may have been lost or
              degraded.
 
Action      : Ensure that the latest drivers and patches are installed.
              Schedule a repair procedure to replace the affected
              device. Use ''fmadm faulty'' to find the affected
disk.
 
--------------- ------------------------------------  -------------- ---------
TIME            CACHE-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
 
Host        : calculon
Platform    : X8DTH-i-6-iF-6F Chassis_id  : 1234567890
Product_sn  :
 
Fault class : fault.io.scsi.cmd.disk.dev.rqs.merr
Affects     : dev:///:devid=id1,sd at n5000c500040ec14b//scsi_vhci/disk at
g5000c500040ec14b
                  faulted but still in service
FRU         : "006"
(hc://:product-id=LSILOGIC-SASX36-A1----:server-id=:chassis-id=50030480003d38ff:serial=3NM1NN4W00009740ZZUH:part=SEAGATE-ST9146802SS:revision=0002/ses-enclosure=0/bay=6/disk=0)
                  faulty
 
Description : The command was terminated with a non-recovered error condition
              that may have been caused by a flaw in the media or an error in
              the recorded data.
              Refer to http://sun.com/msg/DISK-8000-4Q for more
              information.
 
Response    : The device may be offlined or degraded.
 
Impact      : It is likely that continued operation will result in data
              corruption, which may eventually cause the loss of service or the
              service degradation.
 
Action      : Schedule a repair procedure to replace the affected device. Use
              ''fmadm faulty'' to find the affected disk.
------------------- EMAIL END ------------------- 
So, as I said by the time I got to work 1.5 hours had lapsed. I immediately ran
zpool detach on the failed disk ID and yanked it from the chassis.  
I''m baffled as to why FMA called it failed but ZFS decided to keep it
online... only when *I* detached it from the array did it ever fall out of the
pool! Over the course of the next couple hours my "tank" pool
resilvered itself and I set the hot spares back into their normal waiting state
and cleared the errors on my two other pools.  I then scrubbed tank.
This was the output before the scrub:
------------ PRE SCRUB ---------------
 pool: tank
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Nov 23 09:34:22 2011
   41.5G scanned out of 1.34T at 16.1M/s, 23h34m to go
   7.00G resilvered, 3.02% done
config:
	NAME                         STATE     READ WRITE CKSUM
	tank                         ONLINE       0     0     3
	  raidz2-0                   ONLINE       0     0     6
	    c4t5000C500174132FBd0    ONLINE       0     0     0  (resilvering)
	    c4t5000C50017418D4Fd0    ONLINE       0     0     0  (resilvering)
	    c4t5000C500174190A7d0    ONLINE       0     0     0  (resilvering)
	    c4t5000C5001741B323d0    ONLINE       0     0     0  (resilvering)
	    c4t5000C50017A5872Bd0    ONLINE       0     0     1  (resilvering)
	    c4t5000C50017A5886Bd0    ONLINE       0     0     0  (resilvering)
	    spare-6                  ONLINE       0     0     0
	      c4t5000C50017A5892Fd0  ONLINE       0     0     0  (resilvering)
	      c4t5000C50017A6A637d0  ONLINE       0     0     0  (resilvering)
	    c4t5000C50017A58B13d0    ONLINE       0     0     0  (resilvering)
	    spare-8                  ONLINE       0     0     4
	      c4t5000C50017A58B53d0  ONLINE       0     0     0  (resilvering)
	      c4t5000C50017A69513d0  ONLINE       0     0     0  (resilvering)
	    c4t5000C50017A64DA3d0    ONLINE       0     0     7  (resilvering)
	    c4t5000C50017A66063d0    ONLINE       0     0     3  (resilvering)
	    c4t5000C50017A677F7d0    ONLINE       0     0     0  (resilvering)
	spares
	  c4t5000C50017A6A637d0      INUSE     currently in use
	  c4t5000C50017A69513d0      INUSE     currently in use
errors: Permanent errors have been detected in the following files:
       <metadata>:<0x12b>
       <metadata>:<0x8c>
       <metadata>:<0x8d>
       <metadata>:<0xa1>
       tank/veeam_zvol at snap-weekly-1-2011-11-12-000005:<0x1>
---------- END PRE SCRUB ------------------
After the scrub completed and said it had fixed 20.5MB of data, I restarted the
SAN.  This is when all heck broke loose. I found myself in an endless kernel
panic/reboot loop.
I booted with a LiveCD and deleted /etc/zfs/zpool.cache which allowed me to boot
back into the system normally.  I added set aok=1 and set zfs:zfs_recover=1 in
/etc/system and restarted again and then proceeded with:
Any attempt to import my pool "tank", including:
zpool import -fFX tank
zpool import -fFX -o readonly=on tank
causes the same kernel panic message:
----- PANIC ----
panic[cpu2]/thread=ffffff00f8578c40: 
assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG,
 &numbufs, &dbp), file:
 ../../common/fs/zfs/dmu.c, line: 796
ffffff00f8578880 genunix:assfail+7e ()
ffffff00f8578930 zfs:dmu_write+17f ()
ffffff00f85789e0 zfs:space_map_sync+295 ()
ffffff00f8578a70 zfs:metaslab_sync+2de ()
ffffff00f8578ad0 zfs:vdev_sync+d5 ()
ffffff00f8578b80 zfs:spa_sync+44b ()
ffffff00f8578c20 zfs:txg_sync_thread+247 ()
ffffff00f8578c30 unix:thread_start+8 ()
----- END PANIC -----
I then ran zdb -e -bcsvL tank and got this output after letting it run all
night:
----- SNIP -----
Traversing all blocks to verify checksums ...
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 0>
DVA[0]=<0:28b61ec9600:1200> DVA[1]=<0:149e50a5400:1200>
DVA[2]=<0:9f83efe800:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=6e4fd979ea:97c2e68bdf84:7cfea1e3255982:4ca2471aa9f3c833 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 1>
DVA[0]=<0:28b61ecba00:1200> DVA[1]=<0:149e50a7800:1200>
DVA[2]=<0:9f83f00c00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=6d08f1acfa:9fa09650c807:88e6b39ae45bde:56a9ac12f2e4c5d9 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 2>
DVA[0]=<0:28b61eca800:1200> DVA[1]=<0:149e50a6600:1200>
DVA[2]=<0:9f83effa00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=6f4d50c3e4:aa396085bc4e:95d45fc95bf06e:5fdbfe6eb852cf94 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 3>
DVA[0]=<0:28b61eccc00:1200> DVA[1]=<0:149e50a8a00:1200>
DVA[2]=<0:9f83f01e00:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=8a0192487d:c4591e633abd:a6e34d65c0e98e:689ca0220fa0f1da -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 80, 0, 4>
DVA[0]=<0:28b61ecde00:1200> DVA[1]=<0:14cd80adc00:1200>
DVA[2]=<0:9f83f03000:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561650L/2561650P fill=1
cksum=7cf66738ef:ac3d97f68b92:8e517e4e827057:577197cf45e041de -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 140, 1, 2>
DVA[0]=<0:28928362e00:2400> DVA[1]=<0:14d7e17e600:2400>
DVA[2]=<0:9f83081000:2400> [L1 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=4000L/1a00P birth=2561342L/2561342P fill=101
cksum=25605243405:81f7c665f95d6:11999b168af12c49:6ff515eac0269c25 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 141, 0, 140>
DVA[0]=<0:28928331c00:1200> DVA[1]=<0:14a648b4a00:1200>
DVA[2]=<0:9f851ee200:1200> [L0 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=1000L/a00P birth=2561341L/2561341P fill=1
cksum=aa953ce6a5:e711c4fe79d2:be7981b694239c:7529a161247abc19 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 161, 1, 0>
DVA[0]=<0:28928365200:2400> DVA[1]=<0:14d7e180a00:2400>
DVA[2]=<0:9f83083400:2400> [L1 SPA space map] fletcher4 lzjb LE contiguous
unique triple size=4000L/1a00P birth=2561342L/2561342P fill=101
cksum=23cab002ab7:7e3e1246eb050:114166236be5e14c:11c9b1c1410d0433 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 299, 0, 50a>
DVA[0]=<0:170f6f08400:2400> DVA[1]=<0:3eb4386400:2400>
DVA[2]=<0:21e22d7ae00:2400> [L0 bpobj] fletcher4 lzjb LE contiguous unique
triple size=20000L/1c00P birth=2488469L/2488469P fill=1
cksum=3fc87f4bf36:e2523d83de277:20ef6693f583a573:b33c6fc06386e250 -- skipping
zdb_blkptr_cb: Got error 50 reading <0, 299, 0, 28b3>
DVA[0]=<0:289283d8400:2400> DVA[1]=<0:14cd80ab800:2400>
DVA[2]=<0:9f83097800:2400> [L0 bpobj] fletcher4 lzjb LE contiguous unique
triple size=20000L/1a00P birth=2561425L/2561425P fill=1
cksum=37eae7d3f9f:be873fcdaf418:19f6fbd412ee1c8c:545adf8f5512e1c4 -- skipping
Error counts:
        errno  count
           50  10
block traversal size 1763288064 != alloc 1475123189760 (unreachable
1473359901696)
        bp count:           77246
        bp logical:    4974285824      avg:  64395
        bp physical:    435705344      avg:   5640     compression:  11.42
        bp allocated:  1763288064      avg:  22826     compression:   2.82
        bp deduped:             0    ref>1:      0   deduplication:   1.00
        SPA allocated: 1475123189760     used: 41.28%
Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
     -      -       -       -       -       -        -  unallocated
     2    32K      4K   22.5K   11.2K    8.00     0.00  object directory
     2     1K      1K   9.00K   4.50K    1.00     0.00  object array
     2    32K   3.50K   22.5K   11.2K    9.14     0.00  packed nvlist
     -      -       -       -       -       -        -  packed nvlist size
 36.0K  4.47G    287M   1.12G   31.8K   15.92    68.12  bpobj
     -      -       -       -       -       -        -  bpobj header
     -      -       -       -       -       -        -  SPA space map header
 39.2K   162M    128M    533M   13.6K    1.27    31.71  SPA space map
     -      -       -       -       -       -        -  ZIL intent log
    44   704K   90.5K    482K   10.9K    7.78     0.03  DMU dnode
     1     2K     512   4.50K   4.50K    4.00     0.00  DMU objset
     -      -       -       -       -       -        -  DSL directory
    12  6.50K   6.00K   54.0K   4.50K    1.08     0.00  DSL directory child map
    10  5.50K   5.00K   45.0K   4.50K    1.10     0.00  DSL dataset snap map
    75  1.13M    129K    824K   11.0K    8.98     0.05  DSL props
     -      -       -       -       -       -        -  DSL dataset
     -      -       -       -       -       -        -  ZFS znode
     -      -       -       -       -       -        -  ZFS V0 ACL
     -      -       -       -       -       -        -  ZFS plain file
     -      -       -       -       -       -        -  ZFS directory
     -      -       -       -       -       -        -  ZFS master node
     -      -       -       -       -       -        -  ZFS delete queue
     -      -       -       -       -       -        -  zvol object
     -      -       -       -       -       -        -  zvol prop
     -      -       -       -       -       -        -  other uint8[]
     -      -       -       -       -       -        -  other uint64[]
     -      -       -       -       -       -        -  other ZAP
     4    64K   7.00K   45.0K   11.2K    9.14     0.00  persistent error log
     8   912K    181K    675K   84.4K    5.04     0.04  SPA history
     -      -       -       -       -       -        -  SPA history offsets
     1    512     512   4.50K   4.50K    1.00     0.00  Pool properties
     -      -       -       -       -       -        -  DSL permissions
     -      -       -       -       -       -        -  ZFS ACL
     -      -       -       -       -       -        -  ZFS SYSACL
     -      -       -       -       -       -        -  FUID table
     -      -       -       -       -       -        -  FUID table size
     1     1K     512   4.50K   4.50K    2.00     0.00  DSL dataset next clones
     1    512     512   4.50K   4.50K    1.00     0.00  scan work queue
     -      -       -       -       -       -        -  ZFS user/group used
     -      -       -       -       -       -        -  ZFS user/group quota
     -      -       -       -       -       -        -  snapshot refcount tags
     -      -       -       -       -       -        -  DDT ZAP algorithm
     2    32K   3.50K   22.5K   11.2K    9.14     0.00  DDT statistics
     -      -       -       -       -       -        -  System attributes
     -      -       -       -       -       -        -  SA master node
     -      -       -       -       -       -        -  SA attr registration
     -      -       -       -       -       -        -  SA attr layouts
     -      -       -       -       -       -        -  scan translations
     -      -       -       -       -       -        -  deduplicated block
    56  31.0K   28.0K    252K   4.50K    1.11     0.01  DSL deadlist map
     -      -       -       -       -       -        -  DSL deadlist map hdr
     6  3.50K   3.00K   27.0K   4.50K    1.17     0.00  DSL dir clones
    21  2.62M   94.5K    378K   18.0K   28.44     0.02  bpobj subobj
     -      -       -       -       -       -        -  deferred free
     -      -       -       -       -       -        -  dedup ditto
 75.4K  4.63G    416M   1.64G   22.3K   11.42   100.00  Total
                            capacity   operations   bandwidth  ---- errors ----
description                used avail  read write  read write  read write cksum
tank                      1.34T 1.91T    35     0  196K     0     0     0    14
  raidz2                  1.34T 1.91T    35     0  196K     0     0     0    84
    /dev/dsk/c4t5000C500174132FBd0s0     28     0 19.6K     0     0     0     0
    spare                                28     0 20.0K     0     0     0     0
      /dev/dsk/c4t5000C50017418D4Fd0s0    28     0 20.1K     0     0     0     0
      /dev/dsk/c4t5000C50017A6A637d0s0     0     0    54     0     0     0     0
    /dev/dsk/c4t5000C500174190A7d0s0     28     0 19.7K     0     0     0     0
    /dev/dsk/c4t5000C5001741B323d0s0     28     0 19.7K     0     0     0     0
    /dev/dsk/c4t5000C50017A5872Bd0s0     28     0 20.1K     0     0     0     0
    /dev/dsk/c4t5000C50017A5886Bd0s0     28     0 19.6K     0     0     0     0
    /dev/dsk/c4t5000C50017A5892Fd0s0     28     0 19.6K     0     0     0     0
    /dev/dsk/c4t5000C50017A58B13d0s0     28     0 20.1K     0     0     0     0
    spare                                28     0 19.6K     0     0     0     0
      /dev/dsk/c4t5000C50017A58B53d0s0    28     0 19.7K     0     0     0     0
      /dev/dsk/c4t5000C50017A69513d0s0     0     0    49     0     0     0     0
    /dev/dsk/c4t5000C50017A64DA3d0s0     28     0 19.7K     0     0     0     0
    /dev/dsk/c4t5000C50017A66063d0s0     28     0 20.1K     0     0     0     0
    /dev/dsk/c4t5000C50017A677F7d0s0     28     0 19.7K     0     0     0     0
root at calculon:/#
----- END SNIP -----
The only thing I''m seeing there, other than the 10 "error 50"
is the fact both of my hot spares are listed under the disks they replaced
during the degraded operation, rather than where I had put them back into
hot-spare mode (zpool detach <lunid> tank)
I''m at my wits end here with this.  I''ve got a backup
that''s from a week ago, but I''ve got a weeks worth of data I
cannot re-create or re-pull from the original location the data was generated
from.    Had all of this happen after a tape replication I would have just blown
the pool away, remade it and tested to make sure that the failed disk was the
only cause for my ZFS corruption.
Any help anyone can give me to salvage my data would be SO SO SO appreciated.
Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111124/602b7af5/attachment-0001.html>