thr3ads.net - zfs discuss - [zfs-discuss] multiple disk failure [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Mike Tancsa

2011-Jan-29 02:41 UTC

[zfs-discuss] multiple disk failure

Hi,
	I am using FreeBSD 8.2 and went to add 4 new disks today to expand my
offsite storage.  All was working fine for about 20min and then the new
drive cage started to fail.  Silly me for assuming new hardware would be
fine :(

The new drive cage started to fail, it hung the server and the box
rebooted.  After it rebooted, the entire pool is gone and in the state
below.  I had only written a few files to the new larger pool and I am
not concerned about restoring that data.  However, is there a way to get
back the original pool data ?
Going to http://www.sun.com/msg/ZFS-8000-3C gives a 503 error on the web
page listed BTW.


0(offsite)# zpool status
  pool: tank1
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       UNAVAIL      0     0     0  insufficient replicas
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
          raidz1    UNAVAIL      0     0     0  insufficient replicas
            ada0    UNAVAIL      0     0     0  cannot open
            ada1    UNAVAIL      0     0     0  cannot open
            ada2    UNAVAIL      0     0     0  cannot open
            ada3    UNAVAIL      0     0     0  cannot open
0(offsite)#

Edward Ned Harvey

2011-Jan-29 16:38 UTC

head link

[zfs-discuss] multiple disk failure

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Mike Tancsa
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank1       UNAVAIL      0     0     0  insufficient replicas
>           raidz1    ONLINE       0     0     0
>             ad0     ONLINE       0     0     0
>             ad1     ONLINE       0     0     0
>             ad4     ONLINE       0     0     0
>             ad6     ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ada4    ONLINE       0     0     0
>             ada5    ONLINE       0     0     0
>             ada6    ONLINE       0     0     0
>             ada7    ONLINE       0     0     0
>           raidz1    UNAVAIL      0     0     0  insufficient replicas
>             ada0    UNAVAIL      0     0     0  cannot open
>             ada1    UNAVAIL      0     0     0  cannot open
>             ada2    UNAVAIL      0     0     0  cannot open
>             ada3    UNAVAIL      0     0     0  cannot open
That is a huge bummer.  I don''t know if there is any way to recover
aside
from restoring backups.  But I will say this much:

That is precisely the reason why you always want to spread your mirror/raidz
devices across multiple controllers or chassis.  If you lose a controller or
a whole chassis, you lose one device from each vdev, and you''re able to
continue production in a degraded state...

Richard Elling

2011-Jan-29 17:57 UTC

head link

[zfs-discuss] multiple disk failure

On Jan 28, 2011, at 6:41 PM, Mike Tancsa wrote:
> Hi,
> 	I am using FreeBSD 8.2 and went to add 4 new disks today to expand my
> offsite storage.  All was working fine for about 20min and then the new
> drive cage started to fail.  Silly me for assuming new hardware would be
> fine :(
> 
> The new drive cage started to fail, it hung the server and the box
> rebooted.  After it rebooted, the entire pool is gone and in the state
> below.  I had only written a few files to the new larger pool and I am
> not concerned about restoring that data.  However, is there a way to get
> back the original pool data ?
> Going to http://www.sun.com/msg/ZFS-8000-3C gives a 503 error on the web
> page listed BTW.
Oracle has its fair share of idiots :-(  They have been changing around the
websites and blowing all of the links people have setup for the past 20+ years.
> 0(offsite)# zpool status
>  pool: tank1
> state: UNAVAIL
> status: One or more devices could not be opened.  There are insufficient
>        replicas for the pool to continue functioning.
> action: Attach the missing device and online it using ''zpool
online''.
>   see: http://www.sun.com/msg/ZFS-8000-3C
> scrub: none requested
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        tank1       UNAVAIL      0     0     0  insufficient replicas
>          raidz1    ONLINE       0     0     0
>            ad0     ONLINE       0     0     0
>            ad1     ONLINE       0     0     0
>            ad4     ONLINE       0     0     0
>            ad6     ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            ada4    ONLINE       0     0     0
>            ada5    ONLINE       0     0     0
>            ada6    ONLINE       0     0     0
>            ada7    ONLINE       0     0     0
>          raidz1    UNAVAIL      0     0     0  insufficient replicas
>            ada0    UNAVAIL      0     0     0  cannot open
>            ada1    UNAVAIL      0     0     0  cannot open
>            ada2    UNAVAIL      0     0     0  cannot open
>            ada3    UNAVAIL      0     0     0  cannot open
> 0(offsite)#
This is usually easily solved without data loss by making the
disks available again.  Can you read anything from the disks using
any program?
 -- richard

Mike Tancsa

2011-Jan-29 20:58 UTC

head link

[zfs-discuss] multiple disk failure

On 1/29/2011 12:57 PM, Richard Elling wrote:>> 0(offsite)# zpool status
>>  pool: tank1
>> state: UNAVAIL
>> status: One or more devices could not be opened.  There are
insufficient
>>        replicas for the pool to continue functioning.
>> action: Attach the missing device and online it using ''zpool
online''.
>>   see: http://www.sun.com/msg/ZFS-8000-3C
>> scrub: none requested
>> config:
>>
>>        NAME        STATE     READ WRITE CKSUM
>>        tank1       UNAVAIL      0     0     0  insufficient replicas
>>          raidz1    ONLINE       0     0     0
>>            ad0     ONLINE       0     0     0
>>            ad1     ONLINE       0     0     0
>>            ad4     ONLINE       0     0     0
>>            ad6     ONLINE       0     0     0
>>          raidz1    ONLINE       0     0     0
>>            ada4    ONLINE       0     0     0
>>            ada5    ONLINE       0     0     0
>>            ada6    ONLINE       0     0     0
>>            ada7    ONLINE       0     0     0
>>          raidz1    UNAVAIL      0     0     0  insufficient replicas
>>            ada0    UNAVAIL      0     0     0  cannot open
>>            ada1    UNAVAIL      0     0     0  cannot open
>>            ada2    UNAVAIL      0     0     0  cannot open
>>            ada3    UNAVAIL      0     0     0  cannot open
>> 0(offsite)#
> 
> This is usually easily solved without data loss by making the
> disks available again.  Can you read anything from the disks using
> any program?
Thats the strange thing, the disks are readable.  The drive cage just
reset a couple of times prior to the crash. But they seem OK now.  Same
order as well.

# camcontrol devlist
<WDC WD\021501FASR\25500W2B0 \200 0956>  at scbus0 target 0 lun 0
(pass0,ada0)
<WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target 1 lun 0
(pass1,ada1)
<WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target 2 lun 0
(pass2,ada2)
<WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target 3 lun 0
(pass3,ada3)


# dd if=/dev/ada2 of=/dev/null count=20 bs=1024
20+0 records in
20+0 records out
20480 bytes transferred in 0.001634 secs (12534561 bytes/sec)
0(offsite)#

	---Mike

Mike Tancsa

2011-Jan-29 21:11 UTC

head link

[zfs-discuss] multiple disk failure

On 1/29/2011 11:38 AM, Edward Ned Harvey wrote:> 
> That is precisely the reason why you always want to spread your
mirror/raidz
> devices across multiple controllers or chassis.  If you lose a controller
or
> a whole chassis, you lose one device from each vdev, and you''re
able to
> continue production in a degraded state...

Thanks.  These are backups of backups. It would be nice to restore them
as it will take a while to sync up once again.  But if I need to start
fresh, is there a resource you can point me to with the current best
practices for laying out large storage like this ?  Its just for backups
of backups in a DR site

	---Mike

Richard Elling

2011-Jan-29 23:18 UTC

head link

[zfs-discuss] multiple disk failure

On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote:
> On 1/29/2011 12:57 PM, Richard Elling wrote:
>>> 0(offsite)# zpool status
>>> pool: tank1
>>> state: UNAVAIL
>>> status: One or more devices could not be opened.  There are
insufficient
>>>       replicas for the pool to continue functioning.
>>> action: Attach the missing device and online it using
''zpool online''.
>>>  see: http://www.sun.com/msg/ZFS-8000-3C
>>> scrub: none requested
>>> config:
>>> 
>>>       NAME        STATE     READ WRITE CKSUM
>>>       tank1       UNAVAIL      0     0     0  insufficient replicas
>>>         raidz1    ONLINE       0     0     0
>>>           ad0     ONLINE       0     0     0
>>>           ad1     ONLINE       0     0     0
>>>           ad4     ONLINE       0     0     0
>>>           ad6     ONLINE       0     0     0
>>>         raidz1    ONLINE       0     0     0
>>>           ada4    ONLINE       0     0     0
>>>           ada5    ONLINE       0     0     0
>>>           ada6    ONLINE       0     0     0
>>>           ada7    ONLINE       0     0     0
>>>         raidz1    UNAVAIL      0     0     0  insufficient replicas
>>>           ada0    UNAVAIL      0     0     0  cannot open
>>>           ada1    UNAVAIL      0     0     0  cannot open
>>>           ada2    UNAVAIL      0     0     0  cannot open
>>>           ada3    UNAVAIL      0     0     0  cannot open
>>> 0(offsite)#
>> 
>> This is usually easily solved without data loss by making the
>> disks available again.  Can you read anything from the disks using
>> any program?
> 
> Thats the strange thing, the disks are readable.  The drive cage just
> reset a couple of times prior to the crash. But they seem OK now.  Same
> order as well.
> 
> # camcontrol devlist
> <WDC WD\021501FASR\25500W2B0 \200 0956>  at scbus0 target 0 lun 0
> (pass0,ada0)
> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target 1
lun 0
> (pass1,ada1)
> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target 2
lun 0
> (pass2,ada2)
> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target 3
lun 0
> (pass3,ada3)
> 
> 
> # dd if=/dev/ada2 of=/dev/null count=20 bs=1024
> 20+0 records in
> 20+0 records out
> 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec)
> 0(offsite)#
The next step is to run "zdb -l" and look for all 4 labels. Something
like:
	zdb -l /dev/ada2

If all 4 labels exist for each drive and appear intact, then look more closely
at how the OS locates the vdevs. If you can''t solve the
"UNAVAIL" problem,
you won''t be able to import the pool.
 -- richard

Mike Tancsa

2011-Jan-30 00:14 UTC

head link

[zfs-discuss] multiple disk failure

On 1/29/2011 6:18 PM, Richard Elling wrote:>> 0(offsite)#
> 
> The next step is to run "zdb -l" and look for all 4 labels.
Something like:
> 	zdb -l /dev/ada2
> 
> If all 4 labels exist for each drive and appear intact, then look more
closely
> at how the OS locates the vdevs. If you can''t solve the
"UNAVAIL" problem,
> you won''t be able to import the pool.


Hmmm, doesnt look good on any of the drives.  Before I give up, I will
try the drives in a different cage Monday. Unfortunately, its a 150km
away from me at our DR site


# zdb -l /dev/ada0
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3

Richard Elling

2011-Jan-30 05:39 UTC

head link

[zfs-discuss] multiple disk failure

On Jan 29, 2011, at 4:14 PM, Mike Tancsa wrote:
> On 1/29/2011 6:18 PM, Richard Elling wrote:
>>> 0(offsite)#
>> 
>> The next step is to run "zdb -l" and look for all 4 labels.
Something like:
>> 	zdb -l /dev/ada2
>> 
>> If all 4 labels exist for each drive and appear intact, then look more
closely
>> at how the OS locates the vdevs. If you can''t solve the
"UNAVAIL" problem,
>> you won''t be able to import the pool.
> 
> 
> 
> Hmmm, doesnt look good on any of the drives.
I''m not sure of the way BSD enumerates devices.  Some clever person
thought
that hiding the partition or slice would be useful. I don''t find it
useful.  On a Solaris
system, ZFS can show a disk something like c0t1d0, but that doesn''t
exist. The
actual data is in slice 0, so you need to use c0t1d0s0 as the argument to zdb.
 -- richard
>  Before I give up, I will
> try the drives in a different cage Monday. Unfortunately, its a 150km
> away from me at our DR site
> 
> 
> # zdb -l /dev/ada0
> --------------------------------------------
> LABEL 0
> --------------------------------------------
> failed to unpack label 0
> --------------------------------------------
> LABEL 1
> --------------------------------------------
> failed to unpack label 1
> --------------------------------------------
> LABEL 2
> --------------------------------------------
> failed to unpack label 2
> --------------------------------------------
> LABEL 3
> --------------------------------------------
> failed to unpack label 3

Mike Tancsa

2011-Jan-30 12:31 UTC

head link

[zfs-discuss] multiple disk failure

On 1/30/2011 12:39 AM, Richard Elling wrote:>> Hmmm, doesnt look good on any of the drives.
> 
> I''m not sure of the way BSD enumerates devices.  Some clever
person thought
> that hiding the partition or slice would be useful. I don''t find
it useful.  On a Solaris
> system, ZFS can show a disk something like c0t1d0, but that
doesn''t exist. The
> actual data is in slice 0, so you need to use c0t1d0s0 as the argument to
zdb.
I think its the right syntax.  On the older drives,


0(offsite)# zdb -l /dev/ada0
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
0(offsite)# zdb -l /dev/ada4
--------------------------------------------
LABEL 0
--------------------------------------------
    version=15
    name=''tank1''
    state=0
    txg=44593174
    pool_guid=7336939736750289319
    hostid=3221266864
    hostname=''offsite.sentex.ca''
    top_guid=6980939370923808328
    guid=16144392433229115618
    vdev_tree
        type=''raidz''
        id=1
        guid=6980939370923808328
        nparity=1
        metaslab_array=38
        metaslab_shift=35
        ashift=9
        asize=4000799784960
        is_log=0
        children[0]
                type=''disk''
                id=0
                guid=16144392433229115618
                path=''/dev/ada4''
                whole_disk=0
                DTL=341
        children[1]
                type=''disk''
                id=1
                guid=1210677308003674848
                path=''/dev/ada5''
                whole_disk=0
                DTL=340
        children[2]
                type=''disk''
                id=2
                guid=2517076601231706249
                path=''/dev/ada6''
                whole_disk=0
                DTL=339
        children[3]
                type=''disk''
                id=3
                guid=16621760039941477713
                path=''/dev/ada7''
                whole_disk=0
                DTL=338
--------------------------------------------
LABEL 1
--------------------------------------------
    version=15
    name=''tank1''
    state=0
    txg=44592523
    pool_guid=7336939736750289319
    hostid=3221266864
    hostname=''offsite.sentex.ca''
    top_guid=6980939370923808328
    guid=16144392433229115618
    vdev_tree
        type=''raidz''
        id=1
        guid=6980939370923808328
        nparity=1
        metaslab_array=38
        metaslab_shift=35
        ashift=9
        asize=4000799784960
        is_log=0
        children[0]
                type=''disk''
                id=0
                guid=16144392433229115618
                path=''/dev/ada4''
                whole_disk=0
                DTL=341
        children[1]
                type=''disk''
                id=1
                guid=1210677308003674848
                path=''/dev/ada5''
                whole_disk=0
                DTL=340
        children[2]
                type=''disk''
                id=2
                guid=2517076601231706249
                path=''/dev/ada6''
                whole_disk=0
                DTL=339
        children[3]
                type=''disk''
                id=3
                guid=16621760039941477713
                path=''/dev/ada7''
                whole_disk=0
                DTL=338
--------------------------------------------
LABEL 2
--------------------------------------------
    version=15
    name=''tank1''
    state=0
    txg=44593174
    pool_guid=7336939736750289319
    hostid=3221266864
    hostname=''offsite.sentex.ca''
    top_guid=6980939370923808328
    guid=16144392433229115618
    vdev_tree
        type=''raidz''
        id=1
        guid=6980939370923808328
        nparity=1
        metaslab_array=38
        metaslab_shift=35
        ashift=9
        asize=4000799784960
        is_log=0
        children[0]
                type=''disk''
                id=0
                guid=16144392433229115618
                path=''/dev/ada4''
                whole_disk=0
                DTL=341
        children[1]
                type=''disk''
                id=1
                guid=1210677308003674848
                path=''/dev/ada5''
                whole_disk=0
                DTL=340
        children[2]
                type=''disk''
                id=2
                guid=2517076601231706249
                path=''/dev/ada6''
                whole_disk=0
                DTL=339
        children[3]
                type=''disk''
                id=3
                guid=16621760039941477713
                path=''/dev/ada7''
                whole_disk=0
                DTL=338
--------------------------------------------
LABEL 3
--------------------------------------------
    version=15
    name=''tank1''
    state=0
    txg=44592523
    pool_guid=7336939736750289319
    hostid=3221266864
    hostname=''offsite.sentex.ca''
    top_guid=6980939370923808328
    guid=16144392433229115618
    vdev_tree
        type=''raidz''
        id=1
        guid=6980939370923808328
        nparity=1
        metaslab_array=38
        metaslab_shift=35
        ashift=9
        asize=4000799784960
        is_log=0
        children[0]
                type=''disk''
                id=0
                guid=16144392433229115618
                path=''/dev/ada4''
                whole_disk=0
                DTL=341
        children[1]
                type=''disk''
                id=1
                guid=1210677308003674848
                path=''/dev/ada5''
                whole_disk=0
                DTL=340
        children[2]
                type=''disk''
                id=2
                guid=2517076601231706249
                path=''/dev/ada6''
                whole_disk=0
                DTL=339
        children[3]
                type=''disk''
                id=3
                guid=16621760039941477713
                path=''/dev/ada7''
                whole_disk=0
                DTL=338
0(offsite)# zpool status
  pool: tank1
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       UNAVAIL      0     0     0  insufficient replicas
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
          raidz1    UNAVAIL      0     0     0  insufficient replicas
            ada0    UNAVAIL      0     0     0  cannot open
            ada1    UNAVAIL      0     0     0  cannot open
            ada2    UNAVAIL      0     0     0  cannot open
            ada3    UNAVAIL      0     0     0  cannot open
0(offsite)#

Richard Elling

2011-Jan-30 16:29 UTC

head link

[zfs-discuss] multiple disk failure

On Jan 30, 2011, at 4:31 AM, Mike Tancsa wrote:
> On 1/30/2011 12:39 AM, Richard Elling wrote:
>>> Hmmm, doesnt look good on any of the drives.
>> 
>> I''m not sure of the way BSD enumerates devices.  Some clever
person thought
>> that hiding the partition or slice would be useful. I don''t
find it useful.  On a Solaris
>> system, ZFS can show a disk something like c0t1d0, but that
doesn''t exist. The
>> actual data is in slice 0, so you need to use c0t1d0s0 as the argument
to zdb.
> 
> I think its the right syntax.  On the older drives,
Bummer. You''ve got to fix this before you can import the pool.
No labels, no import.
 -- richard

Peter Jeremy

2011-Jan-30 21:09 UTC

head link

[zfs-discuss] multiple disk failure

On 2011-Jan-30 13:39:22 +0800, Richard Elling <richard.elling at
gmail.com> wrote:>I''m not sure of the way BSD enumerates devices.  Some clever person
thought
>that hiding the partition or slice would be useful.
No, there''s no hiding.  /dev/ada0 always refers to the entire physical
disk.
If it had PC-style fdisk slices, there would be a sN suffix.
If it had GPT partitions, there would be a pN suffix.
If it had BSD partitions, there would be an alpha suffix [a-h].
>On a Solaris
>system, ZFS can show a disk something like c0t1d0, but that doesn''t
exist.
If we''re discussing brokenness in OS device names, I''ve always
thought
that reporting device names that don''t exist and not having any way to
access the complete physical disk in Solaris was silly.  Having a fake
''s2'' meaning the whole disk if there''s no label is a
bad kludge.

Mike might like to try "gpart list" - which will display
FreeBSD''s view
of the physical disks.  It might also be worthwhile looking at a hexdump
of the first and last few MB of the "faulty" disks - it''s
possible that
the controller has decided to just shift things by a few sectors so the
labels aren''t where ZFS expects to find them.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110131/0f15ff7a/attachment.bin>

Richard Elling

2011-Jan-30 22:09 UTC

head link

[zfs-discuss] multiple disk failure

On Jan 30, 2011, at 1:09 PM, Peter Jeremy wrote:
> On 2011-Jan-30 13:39:22 +0800, Richard Elling <richard.elling at
gmail.com> wrote:
>> I''m not sure of the way BSD enumerates devices.  Some clever
person thought
>> that hiding the partition or slice would be useful.
> 
> No, there''s no hiding.  /dev/ada0 always refers to the entire
physical disk.
ZFS on Solaris hides the slice when dealing with whole disks using EFI labels.
> If it had PC-style fdisk slices, there would be a sN suffix.
> If it had GPT partitions, there would be a pN suffix.
> If it had BSD partitions, there would be an alpha suffix [a-h].
> 
>> On a Solaris
>> system, ZFS can show a disk something like c0t1d0, but that
doesn''t exist.
> 
> If we''re discussing brokenness in OS device names, I''ve
always thought
> that reporting device names that don''t exist and not having any
way to
> access the complete physical disk in Solaris was silly.  Having a fake
> ''s2'' meaning the whole disk if there''s no label
is a bad kludge.
The "fake" s2 goes back to BSD where the c partition traditionally
meant
the whole disk.  This was just carried forward and changed to "s2"
when
numbers were used instead of letters. With EFI on Solaris, this is no longer
possible and there is "whole disk partition." On a default Solaris
system s0
usually refers to the whole disk less s8. 
> Mike might like to try "gpart list" - which will display
FreeBSD''s view
> of the physical disks.  It might also be worthwhile looking at a hexdump
> of the first and last few MB of the "faulty" disks -
it''s possible that
> the controller has decided to just shift things by a few sectors so the
> labels aren''t where ZFS expects to find them.
Yes, sometimes controllers will steal space from the disk for implementing RAID.
 -- richard

James Van Artsdalen

2011-Jan-31 04:16 UTC

head link

[zfs-discuss] multiple disk failure

He says he''s using FreeBSD.  ZFS recorded names like "ada0"
which always means a whole disk.

In any case FreeBSD will search all block storage for the ZFS dev components if
the cached name is wrong: if the attached disks are connected to the system at
all FreeBSD will find them wherever they may be.

Try FreeBSD 8-STABLE rather than just 8.2-RELEASE as many improvements and fixes
have been backported.  Perhaps try 9-CURRENT as I''m confident the code
there has all of the dev search fixes.

Add the line "vfs.zfs.debug=1" to /boot/loader.conf to get detailed
debug output as FreeBSD tries to import the pool.
-- 
This message posted from opensolaris.org

Mike Tancsa

2011-Jan-31 19:20 UTC

head link

[zfs-discuss] multiple disk failure (solved?)

On 1/29/2011 6:18 PM, Richard Elling wrote:> 
> On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote:
> 
>> On 1/29/2011 12:57 PM, Richard Elling wrote:
>>>> 0(offsite)# zpool status
>>>> pool: tank1
>>>> state: UNAVAIL
>>>> status: One or more devices could not be opened.  There are
insufficient
>>>>       replicas for the pool to continue functioning.
>>>> action: Attach the missing device and online it using
''zpool online''.
>>>>  see: http://www.sun.com/msg/ZFS-8000-3C
>>>> scrub: none requested
>>>> config:
>>>>
>>>>       NAME        STATE     READ WRITE CKSUM
>>>>       tank1       UNAVAIL      0     0     0  insufficient
replicas
>>>>         raidz1    ONLINE       0     0     0
>>>>           ad0     ONLINE       0     0     0
>>>>           ad1     ONLINE       0     0     0
>>>>           ad4     ONLINE       0     0     0
>>>>           ad6     ONLINE       0     0     0
>>>>         raidz1    ONLINE       0     0     0
>>>>           ada4    ONLINE       0     0     0
>>>>           ada5    ONLINE       0     0     0
>>>>           ada6    ONLINE       0     0     0
>>>>           ada7    ONLINE       0     0     0
>>>>         raidz1    UNAVAIL      0     0     0  insufficient
replicas
>>>>           ada0    UNAVAIL      0     0     0  cannot open
>>>>           ada1    UNAVAIL      0     0     0  cannot open
>>>>           ada2    UNAVAIL      0     0     0  cannot open
>>>>           ada3    UNAVAIL      0     0     0  cannot open
>>>> 0(offsite)#
>>>
>>> This is usually easily solved without data loss by making the
>>> disks available again.  Can you read anything from the disks using
>>> any program?
>>
>> Thats the strange thing, the disks are readable.  The drive cage just
>> reset a couple of times prior to the crash. But they seem OK now.  Same
>> order as well.
>>
>> # camcontrol devlist
>> <WDC WD\021501FASR\25500W2B0 \200 0956>  at scbus0 target 0 lun 0
>> (pass0,ada0)
>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target
1 lun 0
>> (pass1,ada1)
>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target
2 lun 0
>> (pass2,ada2)
>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0 target
3 lun 0
>> (pass3,ada3)
>>
>>
>> # dd if=/dev/ada2 of=/dev/null count=20 bs=1024
>> 20+0 records in
>> 20+0 records out
>> 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec)
>> 0(offsite)#
> 
> The next step is to run "zdb -l" and look for all 4 labels.
Something like:
> 	zdb -l /dev/ada2
> 
> If all 4 labels exist for each drive and appear intact, then look more
closely
> at how the OS locates the vdevs. If you can''t solve the
"UNAVAIL" problem,
> you won''t be able to import the pool.
>  -- richard
On 1/29/2011 10:13 PM, James R. Van Artsdalen wrote:> On 1/28/2011 4:46 PM, Mike Tancsa wrote:
>>
>> I had just added another set of disks to my zfs array. It looks like
the
>> drive cage with the new drives is faulty.  I had added a couple of
files
>> to the main pool, but not much.  Is there any way to restore the pool
>> below ? I have a lot of files on ad0,1,4,6 and ada4,5,6,7 and perhaps
>> one file on the new drives in the bad cage.
>
> Get another enclosure and verify it works OK.  Then move the disks from
> the suspect enclosure to the tested enclosure and try to import the pool.
>
> The problem may be cabling or the controller instead - you didn''t
> specify how the disks were attached or which version of FreeBSD
you''re
> using.
>
First off thanks to all who responded on and offlist!

Good news (for me) it seems. New cage and all seems to be recognized
correctly.  The history is

...
2010-04-22.14:27:38 zpool add tank1 raidz /dev/ada4 /dev/ada5 /dev/ada6
/dev/ada7
2010-06-11.13:49:33 zfs create tank1/argus-data
2010-06-11.13:49:41 zfs create tank1/argus-data/previous
2010-06-11.13:50:38 zfs set compression=off tank1/argus-data
2010-08-06.12:20:59 zpool replace tank1 ad1 ad1
2010-09-16.10:17:51 zpool upgrade -a
2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2
/dev/ada3

FreeBSD RELENG_8 from last week, 8G of RAM, amd64.

 zpool status -v
  pool: tank1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada6    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tank1/argus-data/previous/argus-sites-radium.2011.01.28.16.00
        tank1/argus-data:<0xc6>
        /tank1/argus-data/argus-sites-radium

0(offsite)# zpool get all tank1
NAME   PROPERTY       VALUE       SOURCE
tank1  size           14.5T       -
tank1  used           7.56T       -
tank1  available      6.94T       -
tank1  capacity       52%         -
tank1  altroot        -           default
tank1  health         ONLINE      -
tank1  guid           7336939736750289319  default
tank1  version        15          default
tank1  bootfs         -           default
tank1  delegation     on          default
tank1  autoreplace    off         default
tank1  cachefile      -           default
tank1  failmode       wait        default
tank1  listsnapshots  on          local


Do I just want to do a scrub ?

Unfortunately, http://www.sun.com/msg/ZFS-8000-8A gives a 503

zdb now shows

0(offsite)# zdb -l /dev/ada0
--------------------------------------------
LABEL 0
--------------------------------------------
    version=15
    name=''tank1''
    state=0
    txg=44593174
    pool_guid=7336939736750289319
    hostid=3221266864
    hostname=''offsite.sentex.ca''
    top_guid=6980939370923808328
    guid=16144392433229115618
    vdev_tree
        type=''raidz''
        id=1
        guid=6980939370923808328
        nparity=1
        metaslab_array=38
        metaslab_shift=35
        ashift=9
        asize=4000799784960
        is_log=0
        children[0]
                type=''disk''
                id=0
                guid=16144392433229115618
                path=''/dev/ada4''
                whole_disk=0
                DTL=341
        children[1]
                type=''disk''
                id=1
                guid=1210677308003674848
                path=''/dev/ada5''
                whole_disk=0
                DTL=340
        children[2]
                type=''disk''
                id=2
                guid=2517076601231706249
                path=''/dev/ada6''
                whole_disk=0
                DTL=339
        children[3]
                type=''disk''
                id=3
                guid=16621760039941477713
                path=''/dev/ada7''
                whole_disk=0
                DTL=338
--------------------------------------------
LABEL 1
--------------------------------------------
    version=15
    name=''tank1''
    state=0
    txg=44592523
    pool_guid=7336939736750289319
    hostid=3221266864
    hostname=''offsite.sentex.ca''
    top_guid=6980939370923808328
    guid=16144392433229115618
    vdev_tree
        type=''raidz''
        id=1
        guid=6980939370923808328
        nparity=1
        metaslab_array=38
        metaslab_shift=35
        ashift=9
        asize=4000799784960
        is_log=0
        children[0]
                type=''disk''
                id=0
                guid=16144392433229115618
                path=''/dev/ada4''
                whole_disk=0
                DTL=341
        children[1]
                type=''disk''
                id=1
                guid=1210677308003674848
                path=''/dev/ada5''
                whole_disk=0
                DTL=340
        children[2]
                type=''disk''
                id=2
                guid=2517076601231706249
                path=''/dev/ada6''
                whole_disk=0
                DTL=339
        children[3]
                type=''disk''
                id=3
                guid=16621760039941477713
                path=''/dev/ada7''
                whole_disk=0
                DTL=338
--------------------------------------------
LABEL 2
--------------------------------------------
    version=15
    name=''tank1''
    state=0
    txg=44593174
    pool_guid=7336939736750289319
    hostid=3221266864
    hostname=''offsite.sentex.ca''
    top_guid=6980939370923808328
    guid=16144392433229115618
    vdev_tree
        type=''raidz''
        id=1
        guid=6980939370923808328
        nparity=1
        metaslab_array=38
        metaslab_shift=35
        ashift=9
        asize=4000799784960
        is_log=0
        children[0]
                type=''disk''
                id=0
                guid=16144392433229115618
                path=''/dev/ada4''
                whole_disk=0
                DTL=341
        children[1]
                type=''disk''
                id=1
                guid=1210677308003674848
                path=''/dev/ada5''
                whole_disk=0
                DTL=340
        children[2]
                type=''disk''
                id=2
                guid=2517076601231706249
                path=''/dev/ada6''
                whole_disk=0
                DTL=339
        children[3]
                type=''disk''
                id=3
                guid=16621760039941477713
                path=''/dev/ada7''
                whole_disk=0
                DTL=338
--------------------------------------------
LABEL 3
--------------------------------------------
    version=15
    name=''tank1''
    state=0
    txg=44592523
    pool_guid=7336939736750289319
    hostid=3221266864
    hostname=''offsite.sentex.ca''
    top_guid=6980939370923808328
    guid=16144392433229115618
    vdev_tree
        type=''raidz''
        id=1
        guid=6980939370923808328
        nparity=1
        metaslab_array=38
        metaslab_shift=35
        ashift=9
        asize=4000799784960
        is_log=0
        children[0]
                type=''disk''
                id=0
                guid=16144392433229115618
                path=''/dev/ada4''
                whole_disk=0
                DTL=341
        children[1]
                type=''disk''
                id=1
                guid=1210677308003674848
                path=''/dev/ada5''
                whole_disk=0
                DTL=340
        children[2]
                type=''disk''
                id=2
                guid=2517076601231706249
                path=''/dev/ada6''
                whole_disk=0
                DTL=339
        children[3]
                type=''disk''
                id=3
                guid=16621760039941477713
                path=''/dev/ada7''
                whole_disk=0
                DTL=338
0(offsite)#

	---Mike

Cindy Swearingen

2011-Jan-31 20:14 UTC

head link

[zfs-discuss] multiple disk failure (solved?)

Hi Mike,

Yes, this is looking much better.

Some combination of removing corrupted files indicated in the zpool
status -v output, running zpool scrub and then zpool clear should
resolve the corruption, but its depends on how bad the corruption is.

First, I would try least destruction method: Try to remove the
files listed below by using the rm command.

This entry probably means that the metadata is corrupted or some
other file (like a temp file) no longer exists:

tank1/argus-data:<0xc6>

If you are able to remove the individual file with rm, run another
zpool scrub and then a zpool clear to clear the pool errors. You
might need to repeat the zpool scrub/zpool clear combo.

If you can''t remove the individual files, then you might have to
destroy the tank1/argus-data file system.

Let us know what actually works.

Thanks,

Cindy

On 01/31/11 12:20, Mike Tancsa wrote:> On 1/29/2011 6:18 PM, Richard Elling wrote:
>> On Jan 29, 2011, at 12:58 PM, Mike Tancsa wrote:
>>
>>> On 1/29/2011 12:57 PM, Richard Elling wrote:
>>>>> 0(offsite)# zpool status
>>>>> pool: tank1
>>>>> state: UNAVAIL
>>>>> status: One or more devices could not be opened.  There are
insufficient
>>>>>       replicas for the pool to continue functioning.
>>>>> action: Attach the missing device and online it using
''zpool online''.
>>>>>  see: http://www.sun.com/msg/ZFS-8000-3C
>>>>> scrub: none requested
>>>>> config:
>>>>>
>>>>>       NAME        STATE     READ WRITE CKSUM
>>>>>       tank1       UNAVAIL      0     0     0  insufficient
replicas
>>>>>         raidz1    ONLINE       0     0     0
>>>>>           ad0     ONLINE       0     0     0
>>>>>           ad1     ONLINE       0     0     0
>>>>>           ad4     ONLINE       0     0     0
>>>>>           ad6     ONLINE       0     0     0
>>>>>         raidz1    ONLINE       0     0     0
>>>>>           ada4    ONLINE       0     0     0
>>>>>           ada5    ONLINE       0     0     0
>>>>>           ada6    ONLINE       0     0     0
>>>>>           ada7    ONLINE       0     0     0
>>>>>         raidz1    UNAVAIL      0     0     0  insufficient
replicas
>>>>>           ada0    UNAVAIL      0     0     0  cannot open
>>>>>           ada1    UNAVAIL      0     0     0  cannot open
>>>>>           ada2    UNAVAIL      0     0     0  cannot open
>>>>>           ada3    UNAVAIL      0     0     0  cannot open
>>>>> 0(offsite)#
>>>> This is usually easily solved without data loss by making the
>>>> disks available again.  Can you read anything from the disks
using
>>>> any program?
>>> Thats the strange thing, the disks are readable.  The drive cage
just
>>> reset a couple of times prior to the crash. But they seem OK now. 
Same
>>> order as well.
>>>
>>> # camcontrol devlist
>>> <WDC WD\021501FASR\25500W2B0 \200 0956>  at scbus0 target 0
lun 0
>>> (pass0,ada0)
>>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0
target 1 lun 0
>>> (pass1,ada1)
>>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0
target 2 lun 0
>>> (pass2,ada2)
>>> <WDC WD\021501FASR\25500W2B0 \200 05.01D\0205>  at scbus0
target 3 lun 0
>>> (pass3,ada3)
>>>
>>>
>>> # dd if=/dev/ada2 of=/dev/null count=20 bs=1024
>>> 20+0 records in
>>> 20+0 records out
>>> 20480 bytes transferred in 0.001634 secs (12534561 bytes/sec)
>>> 0(offsite)#
>> The next step is to run "zdb -l" and look for all 4 labels.
Something like:
>> 	zdb -l /dev/ada2
>>
>> If all 4 labels exist for each drive and appear intact, then look more
closely
>> at how the OS locates the vdevs. If you can''t solve the
"UNAVAIL" problem,
>> you won''t be able to import the pool.
>>  -- richard
> 
> On 1/29/2011 10:13 PM, James R. Van Artsdalen wrote:
>> On 1/28/2011 4:46 PM, Mike Tancsa wrote:
>>> I had just added another set of disks to my zfs array. It looks
like the
>>> drive cage with the new drives is faulty.  I had added a couple of
files
>>> to the main pool, but not much.  Is there any way to restore the
pool
>>> below ? I have a lot of files on ad0,1,4,6 and ada4,5,6,7 and
perhaps
>>> one file on the new drives in the bad cage.
>> Get another enclosure and verify it works OK.  Then move the disks from
>> the suspect enclosure to the tested enclosure and try to import the
pool.
>>
>> The problem may be cabling or the controller instead - you
didn''t
>> specify how the disks were attached or which version of FreeBSD
you''re
>> using.
>>
> 
> First off thanks to all who responded on and offlist!
> 
> Good news (for me) it seems. New cage and all seems to be recognized
> correctly.  The history is
> 
> ...
> 2010-04-22.14:27:38 zpool add tank1 raidz /dev/ada4 /dev/ada5 /dev/ada6
> /dev/ada7
> 2010-06-11.13:49:33 zfs create tank1/argus-data
> 2010-06-11.13:49:41 zfs create tank1/argus-data/previous
> 2010-06-11.13:50:38 zfs set compression=off tank1/argus-data
> 2010-08-06.12:20:59 zpool replace tank1 ad1 ad1
> 2010-09-16.10:17:51 zpool upgrade -a
> 2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2
> /dev/ada3
> 
> FreeBSD RELENG_8 from last week, 8G of RAM, amd64.
> 
>  zpool status -v
>   pool: tank1
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank1       ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ad0     ONLINE       0     0     0
>             ad1     ONLINE       0     0     0
>             ad4     ONLINE       0     0     0
>             ad6     ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ada0    ONLINE       0     0     0
>             ada1    ONLINE       0     0     0
>             ada2    ONLINE       0     0     0
>             ada3    ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ada5    ONLINE       0     0     0
>             ada8    ONLINE       0     0     0
>             ada7    ONLINE       0     0     0
>             ada6    ONLINE       0     0     0
> 
> errors: Permanent errors have been detected in the following files:
> 
>         /tank1/argus-data/previous/argus-sites-radium.2011.01.28.16.00
>         tank1/argus-data:<0xc6>
>         /tank1/argus-data/argus-sites-radium
> 
> 0(offsite)# zpool get all tank1
> NAME   PROPERTY       VALUE       SOURCE
> tank1  size           14.5T       -
> tank1  used           7.56T       -
> tank1  available      6.94T       -
> tank1  capacity       52%         -
> tank1  altroot        -           default
> tank1  health         ONLINE      -
> tank1  guid           7336939736750289319  default
> tank1  version        15          default
> tank1  bootfs         -           default
> tank1  delegation     on          default
> tank1  autoreplace    off         default
> tank1  cachefile      -           default
> tank1  failmode       wait        default
> tank1  listsnapshots  on          local
> 
> 
> Do I just want to do a scrub ?
> 
> Unfortunately, http://www.sun.com/msg/ZFS-8000-8A gives a 503
> 
> zdb now shows
> 
> 0(offsite)# zdb -l /dev/ada0
> --------------------------------------------
> LABEL 0
> --------------------------------------------
>     version=15
>     name=''tank1''
>     state=0
>     txg=44593174
>     pool_guid=7336939736750289319
>     hostid=3221266864
>     hostname=''offsite.sentex.ca''
>     top_guid=6980939370923808328
>     guid=16144392433229115618
>     vdev_tree
>         type=''raidz''
>         id=1
>         guid=6980939370923808328
>         nparity=1
>         metaslab_array=38
>         metaslab_shift=35
>         ashift=9
>         asize=4000799784960
>         is_log=0
>         children[0]
>                 type=''disk''
>                 id=0
>                 guid=16144392433229115618
>                 path=''/dev/ada4''
>                 whole_disk=0
>                 DTL=341
>         children[1]
>                 type=''disk''
>                 id=1
>                 guid=1210677308003674848
>                 path=''/dev/ada5''
>                 whole_disk=0
>                 DTL=340
>         children[2]
>                 type=''disk''
>                 id=2
>                 guid=2517076601231706249
>                 path=''/dev/ada6''
>                 whole_disk=0
>                 DTL=339
>         children[3]
>                 type=''disk''
>                 id=3
>                 guid=16621760039941477713
>                 path=''/dev/ada7''
>                 whole_disk=0
>                 DTL=338
> --------------------------------------------
> LABEL 1
> --------------------------------------------
>     version=15
>     name=''tank1''
>     state=0
>     txg=44592523
>     pool_guid=7336939736750289319
>     hostid=3221266864
>     hostname=''offsite.sentex.ca''
>     top_guid=6980939370923808328
>     guid=16144392433229115618
>     vdev_tree
>         type=''raidz''
>         id=1
>         guid=6980939370923808328
>         nparity=1
>         metaslab_array=38
>         metaslab_shift=35
>         ashift=9
>         asize=4000799784960
>         is_log=0
>         children[0]
>                 type=''disk''
>                 id=0
>                 guid=16144392433229115618
>                 path=''/dev/ada4''
>                 whole_disk=0
>                 DTL=341
>         children[1]
>                 type=''disk''
>                 id=1
>                 guid=1210677308003674848
>                 path=''/dev/ada5''
>                 whole_disk=0
>                 DTL=340
>         children[2]
>                 type=''disk''
>                 id=2
>                 guid=2517076601231706249
>                 path=''/dev/ada6''
>                 whole_disk=0
>                 DTL=339
>         children[3]
>                 type=''disk''
>                 id=3
>                 guid=16621760039941477713
>                 path=''/dev/ada7''
>                 whole_disk=0
>                 DTL=338
> --------------------------------------------
> LABEL 2
> --------------------------------------------
>     version=15
>     name=''tank1''
>     state=0
>     txg=44593174
>     pool_guid=7336939736750289319
>     hostid=3221266864
>     hostname=''offsite.sentex.ca''
>     top_guid=6980939370923808328
>     guid=16144392433229115618
>     vdev_tree
>         type=''raidz''
>         id=1
>         guid=6980939370923808328
>         nparity=1
>         metaslab_array=38
>         metaslab_shift=35
>         ashift=9
>         asize=4000799784960
>         is_log=0
>         children[0]
>                 type=''disk''
>                 id=0
>                 guid=16144392433229115618
>                 path=''/dev/ada4''
>                 whole_disk=0
>                 DTL=341
>         children[1]
>                 type=''disk''
>                 id=1
>                 guid=1210677308003674848
>                 path=''/dev/ada5''
>                 whole_disk=0
>                 DTL=340
>         children[2]
>                 type=''disk''
>                 id=2
>                 guid=2517076601231706249
>                 path=''/dev/ada6''
>                 whole_disk=0
>                 DTL=339
>         children[3]
>                 type=''disk''
>                 id=3
>                 guid=16621760039941477713
>                 path=''/dev/ada7''
>                 whole_disk=0
>                 DTL=338
> --------------------------------------------
> LABEL 3
> --------------------------------------------
>     version=15
>     name=''tank1''
>     state=0
>     txg=44592523
>     pool_guid=7336939736750289319
>     hostid=3221266864
>     hostname=''offsite.sentex.ca''
>     top_guid=6980939370923808328
>     guid=16144392433229115618
>     vdev_tree
>         type=''raidz''
>         id=1
>         guid=6980939370923808328
>         nparity=1
>         metaslab_array=38
>         metaslab_shift=35
>         ashift=9
>         asize=4000799784960
>         is_log=0
>         children[0]
>                 type=''disk''
>                 id=0
>                 guid=16144392433229115618
>                 path=''/dev/ada4''
>                 whole_disk=0
>                 DTL=341
>         children[1]
>                 type=''disk''
>                 id=1
>                 guid=1210677308003674848
>                 path=''/dev/ada5''
>                 whole_disk=0
>                 DTL=340
>         children[2]
>                 type=''disk''
>                 id=2
>                 guid=2517076601231706249
>                 path=''/dev/ada6''
>                 whole_disk=0
>                 DTL=339
>         children[3]
>                 type=''disk''
>                 id=3
>                 guid=16621760039941477713
>                 path=''/dev/ada7''
>                 whole_disk=0
>                 DTL=338
> 0(offsite)#
> 
> 	---Mike
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Mike Tancsa

2011-Jan-31 21:19 UTC

head link

[zfs-discuss] multiple disk failure (solved?)

On 1/31/2011 3:14 PM, Cindy Swearingen wrote:> Hi Mike,
> 
> Yes, this is looking much better.
> 
> Some combination of removing corrupted files indicated in the zpool
> status -v output, running zpool scrub and then zpool clear should
> resolve the corruption, but its depends on how bad the corruption is.
> 
> First, I would try least destruction method: Try to remove the
> files listed below by using the rm command.
> 
> This entry probably means that the metadata is corrupted or some
> other file (like a temp file) no longer exists:
> 
> tank1/argus-data:<0xc6>

Hi Cindy,
	I removed the files that were listed, and now I am left with

errors: Permanent errors have been detected in the following files:

        tank1/argus-data:<0xc5>
        tank1/argus-data:<0xc6>
        tank1/argus-data:<0xc7>

I have started a scrub
 scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go

I will report back once the scrub is done!

	---Mike

Richard Elling

2011-Jan-31 23:13 UTC

head link

[zfs-discuss] multiple disk failure (solved?)

On Jan 31, 2011, at 1:19 PM, Mike Tancsa wrote:> On 1/31/2011 3:14 PM, Cindy Swearingen wrote:
>> Hi Mike,
>> 
>> Yes, this is looking much better.
>> 
>> Some combination of removing corrupted files indicated in the zpool
>> status -v output, running zpool scrub and then zpool clear should
>> resolve the corruption, but its depends on how bad the corruption is.
>> 
>> First, I would try least destruction method: Try to remove the
>> files listed below by using the rm command.
>> 
>> This entry probably means that the metadata is corrupted or some
>> other file (like a temp file) no longer exists:
>> 
>> tank1/argus-data:<0xc6>
> 
> 
> Hi Cindy,
> 	I removed the files that were listed, and now I am left with
> 
> errors: Permanent errors have been detected in the following files:
> 
>        tank1/argus-data:<0xc5>
>        tank1/argus-data:<0xc6>
>        tank1/argus-data:<0xc7>
> 
> I have started a scrub
> scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go
> 
> I will report back once the scrub is done!
The "permanent" errors report shows the current and previous results.
When you have multiple failures that are recovered, consider running scrub twice
before attempting to correct or delete files.
 -- richard

Mike Tancsa

2011-Feb-01 13:56 UTC

head link

[zfs-discuss] multiple disk failure (solved?)

On 1/31/2011 4:19 PM, Mike Tancsa wrote:> On 1/31/2011 3:14 PM, Cindy Swearingen wrote:
>> Hi Mike,
>>
>> Yes, this is looking much better.
>>
>> Some combination of removing corrupted files indicated in the zpool
>> status -v output, running zpool scrub and then zpool clear should
>> resolve the corruption, but its depends on how bad the corruption is.
>>
>> First, I would try least destruction method: Try to remove the
>> files listed below by using the rm command.
>>
>> This entry probably means that the metadata is corrupted or some
>> other file (like a temp file) no longer exists:
>>
>> tank1/argus-data:<0xc6>
> 
> 
> Hi Cindy,
> 	I removed the files that were listed, and now I am left with
> 
> errors: Permanent errors have been detected in the following files:
> 
>         tank1/argus-data:<0xc5>
>         tank1/argus-data:<0xc6>
>         tank1/argus-data:<0xc7>
> 
> I have started a scrub
>  scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go

Looks like that was it!  The scrub finished in the time it estimated and
that was all I needed to do. I did not have to to do zpool clear or any
other commands.  Is there anything beyond scrub to check the integrity
of the pool ?

0(offsite)# zpool status -v
  pool: tank1
 state: ONLINE
 scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46
2011
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada6    ONLINE       0     0     0

errors: No known data errors
0(offsite)#


	---Mike

Cindy Swearingen

2011-Feb-01 16:03 UTC

head link

[zfs-discuss] multiple disk failure (solved?)

Excellent.

I think you are good for now as long as your hardware setup is stable.

You survived a severe hardware failure so say a prayer and make sure
this doesn''t happen again. Always have good backups.

Thanks,

Cindy

On 02/01/11 06:56, Mike Tancsa wrote:> On 1/31/2011 4:19 PM, Mike Tancsa wrote:
>> On 1/31/2011 3:14 PM, Cindy Swearingen wrote:
>>> Hi Mike,
>>>
>>> Yes, this is looking much better.
>>>
>>> Some combination of removing corrupted files indicated in the zpool
>>> status -v output, running zpool scrub and then zpool clear should
>>> resolve the corruption, but its depends on how bad the corruption
is.
>>>
>>> First, I would try least destruction method: Try to remove the
>>> files listed below by using the rm command.
>>>
>>> This entry probably means that the metadata is corrupted or some
>>> other file (like a temp file) no longer exists:
>>>
>>> tank1/argus-data:<0xc6>
>>
>> Hi Cindy,
>> 	I removed the files that were listed, and now I am left with
>>
>> errors: Permanent errors have been detected in the following files:
>>
>>         tank1/argus-data:<0xc5>
>>         tank1/argus-data:<0xc6>
>>         tank1/argus-data:<0xc7>
>>
>> I have started a scrub
>>  scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go
> 
> 
> Looks like that was it!  The scrub finished in the time it estimated and
> that was all I needed to do. I did not have to to do zpool clear or any
> other commands.  Is there anything beyond scrub to check the integrity
> of the pool ?
> 
> 0(offsite)# zpool status -v
>   pool: tank1
>  state: ONLINE
>  scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46
> 2011
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank1       ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ad0     ONLINE       0     0     0
>             ad1     ONLINE       0     0     0
>             ad4     ONLINE       0     0     0
>             ad6     ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ada0    ONLINE       0     0     0
>             ada1    ONLINE       0     0     0
>             ada2    ONLINE       0     0     0
>             ada3    ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ada5    ONLINE       0     0     0
>             ada8    ONLINE       0     0     0
>             ada7    ONLINE       0     0     0
>             ada6    ONLINE       0     0     0
> 
> errors: No known data errors
> 0(offsite)#
> 
> 
> 	---Mike

Richard Elling

2011-Feb-02 02:37 UTC

head link

[zfs-discuss] multiple disk failure (solved?)

On Feb 1, 2011, at 5:56 AM, Mike Tancsa wrote:> On 1/31/2011 4:19 PM, Mike Tancsa wrote:
>> On 1/31/2011 3:14 PM, Cindy Swearingen wrote:
>>> Hi Mike,
>>> 
>>> Yes, this is looking much better.
>>> 
>>> Some combination of removing corrupted files indicated in the zpool
>>> status -v output, running zpool scrub and then zpool clear should
>>> resolve the corruption, but its depends on how bad the corruption
is.
>>> 
>>> First, I would try least destruction method: Try to remove the
>>> files listed below by using the rm command.
>>> 
>>> This entry probably means that the metadata is corrupted or some
>>> other file (like a temp file) no longer exists:
>>> 
>>> tank1/argus-data:<0xc6>
>> 
>> 
>> Hi Cindy,
>> 	I removed the files that were listed, and now I am left with
>> 
>> errors: Permanent errors have been detected in the following files:
>> 
>>        tank1/argus-data:<0xc5>
>>        tank1/argus-data:<0xc6>
>>        tank1/argus-data:<0xc7>
>> 
>> I have started a scrub
>> scrub: scrub in progress for 0h48m, 10.90% done, 6h35m to go
> 
> 
> Looks like that was it!  The scrub finished in the time it estimated and
> that was all I needed to do. I did not have to to do zpool clear or any
> other commands.  Is there anything beyond scrub to check the integrity
> of the pool ?
That is exactly what scrub does. It validates all data on the disks.

> 
> 0(offsite)# zpool status -v
>  pool: tank1
> state: ONLINE
> scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46
> 2011
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        tank1       ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            ad0     ONLINE       0     0     0
>            ad1     ONLINE       0     0     0
>            ad4     ONLINE       0     0     0
>            ad6     ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            ada0    ONLINE       0     0     0
>            ada1    ONLINE       0     0     0
>            ada2    ONLINE       0     0     0
>            ada3    ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            ada5    ONLINE       0     0     0
>            ada8    ONLINE       0     0     0
>            ada7    ONLINE       0     0     0
>            ada6    ONLINE       0     0     0
> 
> errors: No known data errors
Congrats!
 -- richard

Apparently Analagous Threads

Search for more seemingly similar threads

zfs discuss - Jan 2011 - multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure

[zfs-discuss] multiple disk failure (solved?)

[zfs-discuss] multiple disk failure (solved?)

[zfs-discuss] multiple disk failure (solved?)

[zfs-discuss] multiple disk failure (solved?)

[zfs-discuss] multiple disk failure (solved?)

[zfs-discuss] multiple disk failure (solved?)

[zfs-discuss] multiple disk failure (solved?)

Apparently Analagous Threads