Chris Dunbar - Earthside, LLC
2012-Nov-28 01:56 UTC
[zfs-discuss] Question about degraded drive
Hello, I have a degraded mirror set and this is has happened a few times (not always the same drive) over the last two years. In the past I replaced the drive and and ran zpool replace and all was well. I am wondering, however, if it is safe to run zpool replace without replacing the drive to see if it is in fact failed. On traditional RAID systems I have had drives drop out of an array, but be perfectly fine. Adding them back to the array returned the drive to service and all was well. Does that approach work with ZFS? If not, is there another way to test the drive before making the decision to yank and replace? Thank you! Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121127/929e64bb/attachment.html>
You don''t use replace on mirror vdevs. ''zpool detach'' the failed drive. Then ''zpool attach'' the new drive. On Nov 27, 2012 6:00 PM, "Chris Dunbar - Earthside, LLC" < cdunbar at earthside.net> wrote:> Hello,**** > > ** ** > > I have a degraded mirror set and this is has happened a few times (not > always the same drive) over the last two years. In the past I replaced the > drive and and ran zpool replace and all was well. I am wondering, however, > if it is safe to run zpool replace without replacing the drive to see if it > is in fact failed. On traditional RAID systems I have had drives drop out > of an array, but be perfectly fine. Adding them back to the array returned > the drive to service and all was well. Does that approach work with ZFS? If > not, is there another way to test the drive before making the decision to > yank and replace?**** > > ** ** > > Thank you! > Chris**** > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121127/d03f3431/attachment-0001.html>
And you can try ''zpool online'' on the failed drive to see if it comes back online. On Nov 27, 2012 6:08 PM, "Freddie Cash" <fjwcash at gmail.com> wrote:> You don''t use replace on mirror vdevs. > > ''zpool detach'' the failed drive. Then ''zpool attach'' the new drive. > On Nov 27, 2012 6:00 PM, "Chris Dunbar - Earthside, LLC" < > cdunbar at earthside.net> wrote: > >> Hello,**** >> >> ** ** >> >> I have a degraded mirror set and this is has happened a few times (not >> always the same drive) over the last two years. In the past I replaced the >> drive and and ran zpool replace and all was well. I am wondering, however, >> if it is safe to run zpool replace without replacing the drive to see if it >> is in fact failed. On traditional RAID systems I have had drives drop out >> of an array, but be perfectly fine. Adding them back to the array returned >> the drive to service and all was well. Does that approach work with ZFS? If >> not, is there another way to test the drive before making the decision to >> yank and replace?**** >> >> ** ** >> >> Thank you! >> Chris**** >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121127/3a0b7ec5/attachment.html>
Hi Chris, On Tue, Nov 27, 2012 at 6:56 PM, Chris Dunbar - Earthside, LLC < cdunbar at earthside.net> wrote:> Hello,**** > > ** ** > > I have a degraded mirror set and this is has happened a few times (not > always the same drive) over the last two years. In the past I replaced the > drive and and ran zpool replace and all was well. I am wondering, however, > if it is safe to run zpool replace without replacing the drive to see if it > is in fact failed. On traditional RAID systems I have had drives drop out > of an array, but be perfectly fine. Adding them back to the array returned > the drive to service and all was well. Does that approach work with ZFS? If > not, is there another way to test the drive before making the decision to > yank and replace?**** > > ** ** >I have two tidbits of useful information. 1) "zpool scrub mypoolname" will attempt to read all data on all disks in the pool and verify against the checksum. If you suspect the disk is fine, you can clear the errors, run a scrub, and check the "zpool status" to see if there are read/checksum errors on the disk. If there are, I''d replace the drive. 2) if you have an additional hard drive bay/cable/controller, you can do a "zpool replace" on the offending drive without doing a "detach" first - this may save you from the other drive failing during resilvering. Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121127/9bb89cfa/attachment.html>
Sorry, I was skipping bits to get to the main point. I did use replace (as previously instructed on the list). I think that worked because my spare had taken over for the failed drive. That''s the same situation now - spare in service for the failed drive. Sent from my iPhone On Nov 27, 2012, at 9:08 PM, Freddie Cash <fjwcash at gmail.com> wrote:> You don''t use replace on mirror vdevs. > > ''zpool detach'' the failed drive. Then ''zpool attach'' the new drive. > > On Nov 27, 2012 6:00 PM, "Chris Dunbar - Earthside, LLC" <cdunbar at earthside.net> wrote: >> Hello, >> >> >> >> I have a degraded mirror set and this is has happened a few times (not always the same drive) over the last two years. In the past I replaced the drive and and ran zpool replace and all was well. I am wondering, however, if it is safe to run zpool replace without replacing the drive to see if it is in fact failed. On traditional RAID systems I have had drives drop out of an array, but be perfectly fine. Adding them back to the array returned the drive to service and all was well. Does that approach work with ZFS? If not, is there another way to test the drive before making the decision to yank and replace? >> >> >> >> Thank you! >> Chris >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121127/617e1f12/attachment.html>
Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
2012-Nov-28 14:56 UTC
[zfs-discuss] Question about degraded drive
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Freddie Cash > > And you can try ''zpool online'' on the failed drive to see if it comes back > online.Be cautious here - I have an anecdote, which might represent a trend in best practice, or it might just be an anecdote. At least once, I had an iscsi device go offline, and then I "zpool online"d the device, and it seemed to work - resilvered successfully, zpool status showed clean, I''m able to zfs send and zfs receive. But for normal usage (go in and actually use the files in the pool) it was never usable again. I don''t know the root cause right now. Maybe it''s iscsi related.
Chris Dunbar - Earthside, LLC
2012-Nov-28 19:05 UTC
[zfs-discuss] Question about degraded drive
I appreciate everybody''s feedback, but I am still unclear on how to proceed. Here is a little more information about my setup. Specifically, here is my zpool status: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c10t0d0 ONLINE 0 0 0 c11t0d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c10t1d0 ONLINE 0 0 0 c11t1d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 c11t2d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c10t3d0 ONLINE 0 0 0 c11t3d0 ONLINE 0 0 0 mirror-4 DEGRADED 0 0 0 c10t4d0 ONLINE 0 0 0 spare-1 DEGRADED 0 0 0 c11t4d0 REMOVED 0 0 0 c10t6d0 ONLINE 0 0 0 logs mirror-5 ONLINE 0 0 0 c10t5d0 ONLINE 0 0 0 c11t5d0 ONLINE 0 0 0 cache c8d1 ONLINE 0 0 0 spares c10t6d0 INUSE currently in use In the past I have physically replaced the failed drive and then run these commands: # zpool replace tank c11t4d0 # zpool clear tank In this situation I would like to know if I can hold off on physically replacing the drive. Is there a safe method to test it or put it back into service and see if it fails again? Thank you, Chris ----- Original Message ----- From: "Chris Dunbar - Earthside, LLC" <cdunbar at earthside.net> To: zfs-discuss at opensolaris.org Sent: Tuesday, November 27, 2012 8:56:35 PM Subject: [zfs-discuss] Question about degraded drive Hello, I have a degraded mirror set and this is has happened a few times (not always the same drive) over the last two years. In the past I replaced the drive and and ran zpool replace and all was well. I am wondering, however, if it is safe to run zpool replace without replacing the drive to see if it is in fact failed. On traditional RAID systems I have had drives drop out of an array, but be perfectly fine. Adding them back to the array returned the drive to service and all was well. Does that approach work with ZFS? If not, is there another way to test the drive before making the decision to yank and replace? Thank you! Chris _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20121128/e20566bb/attachment.html>
Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
2012-Nov-29 03:14 UTC
[zfs-discuss] Question about degraded drive
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Chris Dunbar - Earthside, LLC > > #?zpool?replace?tank c11t4d0 > #?zpool?clear tankI would expect this to work, or detach/attach. You should scrub periodically, and ensure no errors after scrub. But the really good question is why does the device go offline?
Chris Dunbar - Earthside, LLC
2012-Nov-29 19:44 UTC
[zfs-discuss] Question about degraded drive
I tried the zpool replace on the failed drive. It returned an I/O error so I am assuming that is confirmation that the drive is indeed dead. I''ll visit the data center to night and swap it out. Thanks for everybody''s help! ----- Original Message ----- From: "Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)" <opensolarisisdeadlongliveopensolaris at nedharvey.com> To: "Chris Dunbar - Earthside, LLC" <cdunbar at earthside.net>, zfs-discuss at opensolaris.org Sent: Wednesday, November 28, 2012 10:14:59 PM Subject: RE: [zfs-discuss] Question about degraded drive> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Chris Dunbar - Earthside, LLC > > #?zpool?replace?tank c11t4d0 > #?zpool?clear tankI would expect this to work, or detach/attach. You should scrub periodically, and ensure no errors after scrub. But the really good question is why does the device go offline?