thr3ads.net - zfs discuss - [zfs-discuss] Raidz1 p [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Brad Hill

2009-Jan-20 04:56 UTC

[zfs-discuss] Raidz1 p

Greetings!

I lost one out of five disks on a machine with a raidz1 and I''m not
sure exactly how to recover from it. The pool is marked as FAULTED which I
certainly wasn''t expecting with only one bum disk.

root at blitz:/# zpool status -v tank
  pool: tank
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        FAULTED      0     0     1  corrupted data
          raidz1    DEGRADED     0     0     6
            c6t0d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
            c6t3d0  UNAVAIL      0     0     0  cannot open
            c6t4d0  ONLINE       0     0     0


Any recovery guidance I may gain from the esteemed experts of this group would
be extremely appreciated. I recently migrated to opensolaris + zfs on the
impassioned advice of a coworker and will lose some data that has been modified
since the move, but not yet backed up yet.

Many thanks in advance...
-- 
This message posted from opensolaris.org

Blake

2009-Jan-20 05:13 UTC

head link

[zfs-discuss] Raidz1 p

Can you share your hardware configuration?

cheers,
Blake

On Mon, Jan 19, 2009 at 11:56 PM, Brad Hill <brad at thosehills.com>
wrote:> Greetings!
>
> I lost one out of five disks on a machine with a raidz1 and I''m
not sure exactly how to recover from it. The pool is marked as FAULTED which I
certainly wasn''t expecting with only one bum disk.
>
> root at blitz:/# zpool status -v tank
>  pool: tank
>  state: FAULTED
> status: One or more devices could not be opened.  There are insufficient
>        replicas for the pool to continue functioning.
> action: Attach the missing device and online it using ''zpool
online''.
>   see: http://www.sun.com/msg/ZFS-8000-3C
>  scrub: none requested
> config:
>
>        NAME        STATE     READ WRITE CKSUM
>        tank        FAULTED      0     0     1  corrupted data
>          raidz1    DEGRADED     0     0     6
>            c6t0d0  ONLINE       0     0     0
>            c6t1d0  ONLINE       0     0     0
>            c6t2d0  ONLINE       0     0     0
>            c6t3d0  UNAVAIL      0     0     0  cannot open
>            c6t4d0  ONLINE       0     0     0
>
>
> Any recovery guidance I may gain from the esteemed experts of this group
would be extremely appreciated. I recently migrated to opensolaris + zfs on the
impassioned advice of a coworker and will lose some data that has been modified
since the move, but not yet backed up yet.
>
> Many thanks in advance...
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Brad Hill

2009-Jan-20 05:58 UTC

head link

[zfs-discuss] Raidz1 p

Sure, and thanks for the quick reply.

Controller: Supermicro AOC-SAT2-MV8 plugged into a 64-big PCI-X 133 bus
Drives: 5 x Seagate 7200.11 1.5TB disks for the raidz1.
Single 36GB western digital 10krpm raptor as system disk. Mate for this is in
but not yet mirrored.
Motherboard: Tyan Thunder K8W S2885 (Dual AMD CPU) with 1GB ECC Ram

Anything else I can provide?

(thanks again)
-- 
This message posted from opensolaris.org

zfs user

2009-Jan-20 08:22 UTC

head link

[zfs-discuss] Raidz1 p

I would get a new 1.5 TB and make sure it has the new firmware and replace 
c6t3d0 right away - even if someone here comes up with a magic solution, you 
don''t want to wait for another drive to fail.

http://hardware.slashdot.org/article.pl?sid=09/01/17/0115207
http://techreport.com/discussions.x/15863


Brad Hill wrote:> Sure, and thanks for the quick reply.
> 
> Controller: Supermicro AOC-SAT2-MV8 plugged into a 64-big PCI-X 133 bus
> Drives: 5 x Seagate 7200.11 1.5TB disks for the raidz1.
> Single 36GB western digital 10krpm raptor as system disk. Mate for this is
in but not yet mirrored.
> Motherboard: Tyan Thunder K8W S2885 (Dual AMD CPU) with 1GB ECC Ram
> 
> Anything else I can provide?
> 
> (thanks again)

Blake Irvin

2009-Jan-20 12:51 UTC

head link

[zfs-discuss] Raidz1 p

I would in this case also immediately export the pool (to prevent any  
write attempts) and see about a firmware update for the failed drive  
(probably need windows for this).

Sent from my iPhone

On Jan 20, 2009, at 3:22 AM, zfs user <zfsml at itsbeen.sent.com> wrote:
> I would get a new 1.5 TB and make sure it has the new firmware and  
> replace
> c6t3d0 right away - even if someone here comes up with a magic  
> solution, you
> don''t want to wait for another drive to fail.
>
> http://hardware.slashdot.org/article.pl?sid=09/01/17/0115207
> http://techreport.com/discussions.x/15863
>
>
> Brad Hill wrote:
>> Sure, and thanks for the quick reply.
>>
>> Controller: Supermicro AOC-SAT2-MV8 plugged into a 64-big PCI-X 133  
>> bus
>> Drives: 5 x Seagate 7200.11 1.5TB disks for the raidz1.
>> Single 36GB western digital 10krpm raptor as system disk. Mate for  
>> this is in but not yet mirrored.
>> Motherboard: Tyan Thunder K8W S2885 (Dual AMD CPU) with 1GB ECC Ram
>>
>> Anything else I can provide?
>>
>> (thanks again)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Brad Hill

2009-Jan-23 03:52 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

> I would get a new 1.5 TB and make sure it has the new
> firmware and replace 
> c6t3d0 right away - even if someone here comes up
> with a magic solution, you 
> don''t want to wait for another drive to fail.
The replacement disk showed up today but I''m unable to replace the one
marked UNAVAIL:

root at blitz:~# zpool replace tank c6t3d0
cannot open ''tank'': pool is unavailable
> I would in this case also immediately export the pool (to prevent any 
> write attempts) and see about a firmware update for the failed drive 
> (probably need windows for this).
While I didn''t export first, I did boot with a livecd and tried to
force the import with that:

root at opensolaris:~# zpool import -f tank
internal error: Bad exchange descriptor
Abort (core dumped)

Hopefully someone on this list understands what situation I am in and how to
resolve it. Again, many thanks in advance for any suggestions you all have to
offer.
-- 
This message posted from opensolaris.org

Blake

2009-Jan-23 22:47 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

I''ve seen reports of a recent Seagate firmware update bricking drives
again.

What''s the output of ''zpool import'' from the LiveCD? 
It sounds like
more than 1 drive is dropping off.



On Thu, Jan 22, 2009 at 10:52 PM, Brad Hill <brad at thosehills.com>
wrote:>> I would get a new 1.5 TB and make sure it has the new
>> firmware and replace
>> c6t3d0 right away - even if someone here comes up
>> with a magic solution, you
>> don''t want to wait for another drive to fail.
>
> The replacement disk showed up today but I''m unable to replace the
one marked UNAVAIL:
>
> root at blitz:~# zpool replace tank c6t3d0
> cannot open ''tank'': pool is unavailable
>
>> I would in this case also immediately export the pool (to prevent any
>> write attempts) and see about a firmware update for the failed drive
>> (probably need windows for this).
>
> While I didn''t export first, I did boot with a livecd and tried to
force the import with that:
>
> root at opensolaris:~# zpool import -f tank
> internal error: Bad exchange descriptor
> Abort (core dumped)
>
> Hopefully someone on this list understands what situation I am in and how
to resolve it. Again, many thanks in advance for any suggestions you all have to
offer.
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Brad Hill

2009-Jan-24 17:48 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

> I''ve seen reports of a recent Seagate firmware update
> bricking drives again.
> 
> What''s the output of ''zpool import'' from the
LiveCD?
>  It sounds like
> ore than 1 drive is dropping off.

root at opensolaris:~# zpool import
  pool: tank
    id: 16342816386332636568
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
	The pool may be active on another system, but can be imported using
	the ''-f'' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

	tank        FAULTED  corrupted data
	  raidz1    DEGRADED
	    c6t0d0  ONLINE
	    c6t1d0  ONLINE
	    c6t2d0  ONLINE
	    c6t3d0  UNAVAIL  cannot open
	    c6t4d0  ONLINE

  pool: rpool
    id: 9891756864015178061
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
	the ''-f'' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

	rpool       ONLINE
	  c3d0s0    ONLINE
-- 
This message posted from opensolaris.org

Brad Hill

2009-Jan-27 18:21 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

Any ideas on this? It looks like a potential bug to me, or there is something
that I''m not seeing.

Thanks again!
-- 
This message posted from opensolaris.org

Chris Du

2009-Jan-27 21:45 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

Do you know 7200.11 has firmware bugs? 

Go to seagate website to check.
-- 
This message posted from opensolaris.org

Blake

2009-Jan-28 01:15 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

I guess you could try ''zpool import -f''.  This is a pretty odd
status,
I think.  I''m pretty sure raidz1 should survive a single disk failure.

Perhaps a more knowledgeable list member can explain.

On Sat, Jan 24, 2009 at 12:48 PM, Brad Hill <brad at thosehills.com>
wrote:>> I''ve seen reports of a recent Seagate firmware update
>> bricking drives again.
>>
>> What''s the output of ''zpool import'' from the
LiveCD?
>>  It sounds like
>> ore than 1 drive is dropping off.
>
>
> root at opensolaris:~# zpool import
>  pool: tank
>    id: 16342816386332636568
>  state: FAULTED
> status: The pool was last accessed by another system.
> action: The pool cannot be imported due to damaged devices or data.
>        The pool may be active on another system, but can be imported using
>        the ''-f'' flag.
>   see: http://www.sun.com/msg/ZFS-8000-EY
> config:
>
>        tank        FAULTED  corrupted data
>          raidz1    DEGRADED
>            c6t0d0  ONLINE
>            c6t1d0  ONLINE
>            c6t2d0  ONLINE
>            c6t3d0  UNAVAIL  cannot open
>            c6t4d0  ONLINE
>
>  pool: rpool
>    id: 9891756864015178061
>  state: ONLINE
> status: The pool was last accessed by another system.
> action: The pool can be imported using its name or numeric identifier and
>        the ''-f'' flag.
>   see: http://www.sun.com/msg/ZFS-8000-EY
> config:
>
>        rpool       ONLINE
>          c3d0s0    ONLINE
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Brad Hill

2009-Jan-28 03:32 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

root at opensolaris:~# zpool import -f tank
internal error: Bad exchange descriptor
Abort (core dumped)

Hoping someone has seen that before... the Google is seriously letting me down
on that one.
> I guess you could try ''zpool import -f''.  This is a
> pretty odd status,
> I think.  I''m pretty sure raidz1 should survive a
> single disk failure.
> 
> Perhaps a more knowledgeable list member can explain.-- 
This message posted from opensolaris.org

Blake

2009-Jan-28 05:10 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

This is outside the scope of my knowledge/experience.  Maybe there is
now a core file you can examine?  That might help you at least see
what''s going on?

On Tue, Jan 27, 2009 at 10:32 PM, Brad Hill <brad at thosehills.com>
wrote:> root at opensolaris:~# zpool import -f tank
> internal error: Bad exchange descriptor
> Abort (core dumped)
>
> Hoping someone has seen that before... the Google is seriously letting me
down on that one.
>
>> I guess you could try ''zpool import -f''.  This is a
>> pretty odd status,
>> I think.  I''m pretty sure raidz1 should survive a
>> single disk failure.
>>
>> Perhaps a more knowledgeable list member can explain.
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Brad Hill

2009-Jan-28 06:16 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

I do, thank you. The disk that went out sounds like it had a head crash or some
such - loud clicking shortly after spin-up then it spins down and gives me
nothing. BIOS doesn''t even detect it properly to do a firmware update.

> Do you know 7200.11 has firmware bugs? 
> 
> Go to seagate website to check.-- 
This message posted from opensolaris.org

Ross

2009-Jan-28 06:59 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

Just a thought, but have you physically disconnected the bad disk? 
It''s not unheard of for a bad disk to cause problems with others.

Failing that, it''s the "corrupted data" bit that''s
worrying me, it sounds like you may have other corruption on the pool (always a
risk with single parity raid), but I''m worried that it''s not
giving you any more details as to what''s wrong.

Also, what version of OpenSolaris are you running?  Could you maybe try booting
off a CD of the latest build?  There are often improvements in the way ZFS copes
with errors, so it''s worth a try.  I don''t think it''s
likely to help, but I wouldn''t discount it.
-- 
This message posted from opensolaris.org

Brad Hill

2009-Jan-29 03:02 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

Yes. I have disconnected the bad disk and booted with nothing in the slot, and
also with known good replacement disk in on the same sata port. Doesn''t
change anything.

Running 2008.11 on the box and 2008.11 snv_101b_rc2 on the LiveCD. I''ll
give it a shot booting from the latest build and see if that makes any kind of
difference.

Thanks for the suggestions.

Brad
> Just a thought, but have you physically disconnected
> the bad disk?  It''s not unheard of for a bad disk to
> cause problems with others.
> 
> Failing that, it''s the "corrupted data" bit
that''s
> worrying me, it sounds like you may have other
> corruption on the pool (always a risk with single
> parity raid), but I''m worried that it''s not giving
> you any more details as to what''s wrong.
> 
> Also, what version of OpenSolaris are you running?
> Could you maybe try booting off a CD of the latest
> build?  There are often improvements in the way ZFS
> copes with errors, so it''s worth a try.  I don''t
> think it''s likely to help, but I wouldn''t discount
>  it.-- 
This message posted from opensolaris.org

Pål Baltzersen

2009-Jan-30 14:16 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

Take the new disk out as well.. foreign/bad non-zero disk label may cause
trouble too.

I''ve experienced tool core dumps with foreign disk (partition) label
which might be the case if it is a recycled replacement disk (In my case fixed
by plugging the disk it into a linux desktop and "blanking" the disk
by wiping the label with "dd if=/dev/zero of=/dev/sdc bs=512 count=4"
where /dev/sdc was the device it got assigned (linux: fdisk -l)).
-- 
This message posted from opensolaris.org

Haudy Kazemi

2009-Apr-22 11:45 UTC

head link

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

Brad Hill wrote:>> I''ve seen reports of a recent Seagate firmware update
>> bricking drives again.
>>
>> What''s the output of ''zpool import'' from the
LiveCD?
>>  It sounds like
>> ore than 1 drive is dropping off.
>>     
>
>
> root at opensolaris:~# zpool import
>   pool: tank
>     id: 16342816386332636568
>  state: FAULTED
> status: The pool was last accessed by another system.
> action: The pool cannot be imported due to damaged devices or data.
> 	The pool may be active on another system, but can be imported using
> 	the ''-f'' flag.
>    see: http://www.sun.com/msg/ZFS-8000-EY
> config:
>
> 	tank        FAULTED  corrupted data
> 	  raidz1    DEGRADED
> 	    c6t0d0  ONLINE
> 	    c6t1d0  ONLINE
> 	    c6t2d0  ONLINE
> 	    c6t3d0  UNAVAIL  cannot open
> 	    c6t4d0  ONLINE
>
>   pool: rpool
>     id: 9891756864015178061
>  state: ONLINE
> status: The pool was last accessed by another system.
> action: The pool can be imported using its name or numeric identifier and
> 	the ''-f'' flag.
>    see: http://www.sun.com/msg/ZFS-8000-EY
> config:
>
> 	rpool       ONLINE
> 	  c3d0s0    ONLINE
>   1.) Here''s a similar report from last summer from someone running ZFS
on
FreeBSD.  No resolution there either:
raidz vdev marked faulted with only one faulted disk
http://kerneltrap.org/index.php?q=mailarchive/freebsd-fs/2008/6/15/2132754

2.) This old thread from Dec 2007 for a different raidz1 problem, titled 
''Faulted raidz1 shows the same device twice'' suggests trying
these
commands (see the link for the context they were run under):
http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg13214.html

# zdb -l /dev/dsk/c18t0d0

# zpool export external
# zpool import external

# zpool clear external
# zpool scrub external
# zpool clear external

3.) Do you have ECC RAM? Have you verified that your memory, cpu, and 
motherboard are reliable?

4.) ''Bad exchange descriptor'' is mentioned very sparingly
across the
net, mostly in system error tables.  Also here: 
http://opensolaris.org/jive/thread.jspa?threadID=88486&tstart=165

5.) More raidz setup caveats, at least on MacOS: 
http://lists.macosforge.org/pipermail/zfs-discuss/2008-March/000346.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090422/931418a6/attachment.html>

zfs discuss - Jan 2009 - Raidz1 p

[zfs-discuss] Raidz1 p

[zfs-discuss] Raidz1 p

[zfs-discuss] Raidz1 p

[zfs-discuss] Raidz1 p

[zfs-discuss] Raidz1 p

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting

[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.