Greetings! I lost one out of five disks on a machine with a raidz1 and I''m not sure exactly how to recover from it. The pool is marked as FAULTED which I certainly wasn''t expecting with only one bum disk. root at blitz:/# zpool status -v tank pool: tank state: FAULTED status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM tank FAULTED 0 0 1 corrupted data raidz1 DEGRADED 0 0 6 c6t0d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c6t3d0 UNAVAIL 0 0 0 cannot open c6t4d0 ONLINE 0 0 0 Any recovery guidance I may gain from the esteemed experts of this group would be extremely appreciated. I recently migrated to opensolaris + zfs on the impassioned advice of a coworker and will lose some data that has been modified since the move, but not yet backed up yet. Many thanks in advance... -- This message posted from opensolaris.org
Can you share your hardware configuration? cheers, Blake On Mon, Jan 19, 2009 at 11:56 PM, Brad Hill <brad at thosehills.com> wrote:> Greetings! > > I lost one out of five disks on a machine with a raidz1 and I''m not sure exactly how to recover from it. The pool is marked as FAULTED which I certainly wasn''t expecting with only one bum disk. > > root at blitz:/# zpool status -v tank > pool: tank > state: FAULTED > status: One or more devices could not be opened. There are insufficient > replicas for the pool to continue functioning. > action: Attach the missing device and online it using ''zpool online''. > see: http://www.sun.com/msg/ZFS-8000-3C > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank FAULTED 0 0 1 corrupted data > raidz1 DEGRADED 0 0 6 > c6t0d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c6t2d0 ONLINE 0 0 0 > c6t3d0 UNAVAIL 0 0 0 cannot open > c6t4d0 ONLINE 0 0 0 > > > Any recovery guidance I may gain from the esteemed experts of this group would be extremely appreciated. I recently migrated to opensolaris + zfs on the impassioned advice of a coworker and will lose some data that has been modified since the move, but not yet backed up yet. > > Many thanks in advance... > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Sure, and thanks for the quick reply. Controller: Supermicro AOC-SAT2-MV8 plugged into a 64-big PCI-X 133 bus Drives: 5 x Seagate 7200.11 1.5TB disks for the raidz1. Single 36GB western digital 10krpm raptor as system disk. Mate for this is in but not yet mirrored. Motherboard: Tyan Thunder K8W S2885 (Dual AMD CPU) with 1GB ECC Ram Anything else I can provide? (thanks again) -- This message posted from opensolaris.org
I would get a new 1.5 TB and make sure it has the new firmware and replace c6t3d0 right away - even if someone here comes up with a magic solution, you don''t want to wait for another drive to fail. http://hardware.slashdot.org/article.pl?sid=09/01/17/0115207 http://techreport.com/discussions.x/15863 Brad Hill wrote:> Sure, and thanks for the quick reply. > > Controller: Supermicro AOC-SAT2-MV8 plugged into a 64-big PCI-X 133 bus > Drives: 5 x Seagate 7200.11 1.5TB disks for the raidz1. > Single 36GB western digital 10krpm raptor as system disk. Mate for this is in but not yet mirrored. > Motherboard: Tyan Thunder K8W S2885 (Dual AMD CPU) with 1GB ECC Ram > > Anything else I can provide? > > (thanks again)
I would in this case also immediately export the pool (to prevent any write attempts) and see about a firmware update for the failed drive (probably need windows for this). Sent from my iPhone On Jan 20, 2009, at 3:22 AM, zfs user <zfsml at itsbeen.sent.com> wrote:> I would get a new 1.5 TB and make sure it has the new firmware and > replace > c6t3d0 right away - even if someone here comes up with a magic > solution, you > don''t want to wait for another drive to fail. > > http://hardware.slashdot.org/article.pl?sid=09/01/17/0115207 > http://techreport.com/discussions.x/15863 > > > Brad Hill wrote: >> Sure, and thanks for the quick reply. >> >> Controller: Supermicro AOC-SAT2-MV8 plugged into a 64-big PCI-X 133 >> bus >> Drives: 5 x Seagate 7200.11 1.5TB disks for the raidz1. >> Single 36GB western digital 10krpm raptor as system disk. Mate for >> this is in but not yet mirrored. >> Motherboard: Tyan Thunder K8W S2885 (Dual AMD CPU) with 1GB ECC Ram >> >> Anything else I can provide? >> >> (thanks again) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Brad Hill
2009-Jan-23 03:52 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
> I would get a new 1.5 TB and make sure it has the new > firmware and replace > c6t3d0 right away - even if someone here comes up > with a magic solution, you > don''t want to wait for another drive to fail.The replacement disk showed up today but I''m unable to replace the one marked UNAVAIL: root at blitz:~# zpool replace tank c6t3d0 cannot open ''tank'': pool is unavailable> I would in this case also immediately export the pool (to prevent any > write attempts) and see about a firmware update for the failed drive > (probably need windows for this).While I didn''t export first, I did boot with a livecd and tried to force the import with that: root at opensolaris:~# zpool import -f tank internal error: Bad exchange descriptor Abort (core dumped) Hopefully someone on this list understands what situation I am in and how to resolve it. Again, many thanks in advance for any suggestions you all have to offer. -- This message posted from opensolaris.org
Blake
2009-Jan-23 22:47 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
I''ve seen reports of a recent Seagate firmware update bricking drives again. What''s the output of ''zpool import'' from the LiveCD? It sounds like more than 1 drive is dropping off. On Thu, Jan 22, 2009 at 10:52 PM, Brad Hill <brad at thosehills.com> wrote:>> I would get a new 1.5 TB and make sure it has the new >> firmware and replace >> c6t3d0 right away - even if someone here comes up >> with a magic solution, you >> don''t want to wait for another drive to fail. > > The replacement disk showed up today but I''m unable to replace the one marked UNAVAIL: > > root at blitz:~# zpool replace tank c6t3d0 > cannot open ''tank'': pool is unavailable > >> I would in this case also immediately export the pool (to prevent any >> write attempts) and see about a firmware update for the failed drive >> (probably need windows for this). > > While I didn''t export first, I did boot with a livecd and tried to force the import with that: > > root at opensolaris:~# zpool import -f tank > internal error: Bad exchange descriptor > Abort (core dumped) > > Hopefully someone on this list understands what situation I am in and how to resolve it. Again, many thanks in advance for any suggestions you all have to offer. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Brad Hill
2009-Jan-24 17:48 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
> I''ve seen reports of a recent Seagate firmware update > bricking drives again. > > What''s the output of ''zpool import'' from the LiveCD? > It sounds like > ore than 1 drive is dropping off.root at opensolaris:~# zpool import pool: tank id: 16342816386332636568 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank FAULTED corrupted data raidz1 DEGRADED c6t0d0 ONLINE c6t1d0 ONLINE c6t2d0 ONLINE c6t3d0 UNAVAIL cannot open c6t4d0 ONLINE pool: rpool id: 9891756864015178061 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the ''-f'' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: rpool ONLINE c3d0s0 ONLINE -- This message posted from opensolaris.org
Brad Hill
2009-Jan-27 18:21 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan
Any ideas on this? It looks like a potential bug to me, or there is something that I''m not seeing. Thanks again! -- This message posted from opensolaris.org
Chris Du
2009-Jan-27 21:45 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan
Do you know 7200.11 has firmware bugs? Go to seagate website to check. -- This message posted from opensolaris.org
Blake
2009-Jan-28 01:15 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
I guess you could try ''zpool import -f''. This is a pretty odd status, I think. I''m pretty sure raidz1 should survive a single disk failure. Perhaps a more knowledgeable list member can explain. On Sat, Jan 24, 2009 at 12:48 PM, Brad Hill <brad at thosehills.com> wrote:>> I''ve seen reports of a recent Seagate firmware update >> bricking drives again. >> >> What''s the output of ''zpool import'' from the LiveCD? >> It sounds like >> ore than 1 drive is dropping off. > > > root at opensolaris:~# zpool import > pool: tank > id: 16342816386332636568 > state: FAULTED > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > tank FAULTED corrupted data > raidz1 DEGRADED > c6t0d0 ONLINE > c6t1d0 ONLINE > c6t2d0 ONLINE > c6t3d0 UNAVAIL cannot open > c6t4d0 ONLINE > > pool: rpool > id: 9891756864015178061 > state: ONLINE > status: The pool was last accessed by another system. > action: The pool can be imported using its name or numeric identifier and > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > rpool ONLINE > c3d0s0 ONLINE > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Brad Hill
2009-Jan-28 03:32 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting
root at opensolaris:~# zpool import -f tank internal error: Bad exchange descriptor Abort (core dumped) Hoping someone has seen that before... the Google is seriously letting me down on that one.> I guess you could try ''zpool import -f''. This is a > pretty odd status, > I think. I''m pretty sure raidz1 should survive a > single disk failure. > > Perhaps a more knowledgeable list member can explain.-- This message posted from opensolaris.org
This is outside the scope of my knowledge/experience. Maybe there is now a core file you can examine? That might help you at least see what''s going on? On Tue, Jan 27, 2009 at 10:32 PM, Brad Hill <brad at thosehills.com> wrote:> root at opensolaris:~# zpool import -f tank > internal error: Bad exchange descriptor > Abort (core dumped) > > Hoping someone has seen that before... the Google is seriously letting me down on that one. > >> I guess you could try ''zpool import -f''. This is a >> pretty odd status, >> I think. I''m pretty sure raidz1 should survive a >> single disk failure. >> >> Perhaps a more knowledgeable list member can explain. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Brad Hill
2009-Jan-28 06:16 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistan
I do, thank you. The disk that went out sounds like it had a head crash or some such - loud clicking shortly after spin-up then it spins down and gives me nothing. BIOS doesn''t even detect it properly to do a firmware update.> Do you know 7200.11 has firmware bugs? > > Go to seagate website to check.-- This message posted from opensolaris.org
Just a thought, but have you physically disconnected the bad disk? It''s not unheard of for a bad disk to cause problems with others. Failing that, it''s the "corrupted data" bit that''s worrying me, it sounds like you may have other corruption on the pool (always a risk with single parity raid), but I''m worried that it''s not giving you any more details as to what''s wrong. Also, what version of OpenSolaris are you running? Could you maybe try booting off a CD of the latest build? There are often improvements in the way ZFS copes with errors, so it''s worth a try. I don''t think it''s likely to help, but I wouldn''t discount it. -- This message posted from opensolaris.org
Brad Hill
2009-Jan-29 03:02 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting
Yes. I have disconnected the bad disk and booted with nothing in the slot, and also with known good replacement disk in on the same sata port. Doesn''t change anything. Running 2008.11 on the box and 2008.11 snv_101b_rc2 on the LiveCD. I''ll give it a shot booting from the latest build and see if that makes any kind of difference. Thanks for the suggestions. Brad> Just a thought, but have you physically disconnected > the bad disk? It''s not unheard of for a bad disk to > cause problems with others. > > Failing that, it''s the "corrupted data" bit that''s > worrying me, it sounds like you may have other > corruption on the pool (always a risk with single > parity raid), but I''m worried that it''s not giving > you any more details as to what''s wrong. > > Also, what version of OpenSolaris are you running? > Could you maybe try booting off a CD of the latest > build? There are often improvements in the way ZFS > copes with errors, so it''s worth a try. I don''t > think it''s likely to help, but I wouldn''t discount > it.-- This message posted from opensolaris.org
Pål Baltzersen
2009-Jan-30 14:16 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting
Take the new disk out as well.. foreign/bad non-zero disk label may cause trouble too. I''ve experienced tool core dumps with foreign disk (partition) label which might be the case if it is a recycled replacement disk (In my case fixed by plugging the disk it into a linux desktop and "blanking" the disk by wiping the label with "dd if=/dev/zero of=/dev/sdc bs=512 count=4" where /dev/sdc was the device it got assigned (linux: fdisk -l)). -- This message posted from opensolaris.org
Haudy Kazemi
2009-Apr-22 11:45 UTC
[zfs-discuss] Raidz1 faulted with single bad disk. Requesting assistance.
Brad Hill wrote:>> I''ve seen reports of a recent Seagate firmware update >> bricking drives again. >> >> What''s the output of ''zpool import'' from the LiveCD? >> It sounds like >> ore than 1 drive is dropping off. >> > > > root at opensolaris:~# zpool import > pool: tank > id: 16342816386332636568 > state: FAULTED > status: The pool was last accessed by another system. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on another system, but can be imported using > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > tank FAULTED corrupted data > raidz1 DEGRADED > c6t0d0 ONLINE > c6t1d0 ONLINE > c6t2d0 ONLINE > c6t3d0 UNAVAIL cannot open > c6t4d0 ONLINE > > pool: rpool > id: 9891756864015178061 > state: ONLINE > status: The pool was last accessed by another system. > action: The pool can be imported using its name or numeric identifier and > the ''-f'' flag. > see: http://www.sun.com/msg/ZFS-8000-EY > config: > > rpool ONLINE > c3d0s0 ONLINE >1.) Here''s a similar report from last summer from someone running ZFS on FreeBSD. No resolution there either: raidz vdev marked faulted with only one faulted disk http://kerneltrap.org/index.php?q=mailarchive/freebsd-fs/2008/6/15/2132754 2.) This old thread from Dec 2007 for a different raidz1 problem, titled ''Faulted raidz1 shows the same device twice'' suggests trying these commands (see the link for the context they were run under): http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg13214.html # zdb -l /dev/dsk/c18t0d0 # zpool export external # zpool import external # zpool clear external # zpool scrub external # zpool clear external 3.) Do you have ECC RAM? Have you verified that your memory, cpu, and motherboard are reliable? 4.) ''Bad exchange descriptor'' is mentioned very sparingly across the net, mostly in system error tables. Also here: http://opensolaris.org/jive/thread.jspa?threadID=88486&tstart=165 5.) More raidz setup caveats, at least on MacOS: http://lists.macosforge.org/pipermail/zfs-discuss/2008-March/000346.html -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090422/931418a6/attachment.html>