I have a raidz2 pool with one disk that seems to be going bad, several errors are noted in iostat. I have an RMA for the drive, however - no I am wondering how I proceed. I need to send the drive in and then they will send me one back. If I had the drive on hand, I could do a zpool replace. Do I do a zpool offline? zpool detach? Once I get the drive back and put it in the same drive bay.. Is it just a zpool replace <device>? -- This message posted from opensolaris.org
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Brian > > I have a raidz2 pool with one disk that seems to be going bad, severalerrors> are noted in iostat. I have an RMA for the drive, however - no I am > wondering how I proceed. I need to send the drive in and then they will > send me one back. If I had the drive on hand, I could do a zpool replace. > > Do I do a zpool offline? zpool detach? > Once I get the drive back and put it in the same drive bay.. Is it just azpool> replace <device>?Just guessing you don''t have hotswap drive bays, because you don''t have an advance replacement warranty on your hardware. ;-) Which means you''re going to have to shutdown anyway. So: I would zpool export. That will ensure drives are all stopped. Then I would make the faulted drive blink. Something like: while true ; do dd if=/dev/rdsk/baddisk of=/dev/null bs=1024k count=8000 ; sleep 1 ; done Make a note of which drive is the bad drive. Shutdown. Remove it. Boot up again. zpool import -a Now you will see the removed drive appearing as "offline" or whatever status is most helpful. Later, your new drive arrives. Don''t attach it yet. Run the "format" command to get a list of drives in the system, and immediately quit (don''t cause harm to your system with format!) shutdown, attach the new drive. Run "format" again, verify it''s the same as before (new drive not available yet) You probably have to do something like devfsadm -Cv Run "format" again, and now you should easily be able to identify the new device name. It may or may not match the device name of the drive you removed. And finally, you can do the zpool replace
On May 28, 2011, at 10:15 AM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Brian >> >> I have a raidz2 pool with one disk that seems to be going bad, several > errors >> are noted in iostat. I have an RMA for the drive, however - no I am >> wondering how I proceed. I need to send the drive in and then they will >> send me one back. If I had the drive on hand, I could do a zpool replace. >> >> Do I do a zpool offline? zpool detach? >> Once I get the drive back and put it in the same drive bay.. Is it just a > zpool >> replace <device>? > > Just guessing you don''t have hotswap drive bays, because you don''t have an > advance replacement warranty on your hardware. ;-) Which means you''re > going to have to shutdown anyway. So: > > I would zpool export. That will ensure drives are all stopped. > Then I would make the faulted drive blink. Something like: > while true ; do dd if=/dev/rdsk/baddisk of=/dev/null bs=1024k count=8000 ; > sleep 1 ; done > > Make a note of which drive is the bad drive. > Shutdown. > Remove it. > Boot up again. > zpool import -a > > Now you will see the removed drive appearing as "offline" or whatever status > is most helpful.Yuck. What an ugly procedure :-( zpool offline is a better method. Identifying the location of the drive is OS, OS rev, and hardware dependent. Some OSes are easier than others. For example in NexentaStor, the device serial numbers are readily displayed, so you can verify the serial number of the disk. MacOS also provides easy access to the disk serial number for verification. When the new disk arrives, do the zpool replace. A detailed method for doing this on Solaris is in the ZFS Admin Guide. -- richard
On Sat, May 28, 2011 at 1:15 PM, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Brian >> >> I have a raidz2 pool with one disk that seems to be going bad, several > errors >> are noted in iostat. ?I have an RMA for the drive, however - no I am >> wondering how I proceed. ?I need to send the drive in and then they will >> send me one back. ?If I had the drive on hand, I could do a zpool replace. >> >> Do I do a zpool offline? zpool detach? >> Once I get the drive back and put it in the same drive bay.. ?Is it just a > zpool >> replace <device>? > > Just guessing you don''t have hotswap drive bays, because you don''t have an > advance replacement warranty on your hardware. ?;-) ?Which means you''re > going to have to shutdown anyway. ?So:My drive bays are hotswap enabled. Its a 20 bay hotswap case. I am just for personal use, not commercial so I have consumer grade disk and replacement warranties.> > I would zpool export. ?That will ensure drives are all stopped. > Then I would make the faulted drive blink. ?Something like: > while true ; do dd if=/dev/rdsk/baddisk of=/dev/null bs=1024k count=8000 ; > sleep 1 ; doneI would rather not shutdown if I don''t have to.> > Make a note of which drive is the bad drive. > Shutdown. > Remove it. > Boot up again. > zpool import -a > > Now you will see the removed drive appearing as "offline" or whatever status > is most helpful. > > Later, your new drive arrives. ?Don''t attach it yet. > Run the "format" command to get a list of drives in the system, and > immediately quit (don''t cause harm to your system with format!) > shutdown, attach the new drive. > Run "format" again, verify it''s the same as before (new drive not available > yet) > You probably have to do something like devfsadm -Cv > Run "format" again, and now you should easily be able to identify the new > device name. ?It may or may not match the device name of the drive you > removed. > > And finally, you can do the zpool replace > >Thanks.
Thanks for the input. On Sat, May 28, 2011 at 1:35 PM, Richard Elling <richard.elling at gmail.com> wrote:> On May 28, 2011, at 10:15 AM, Edward Ned Harvey wrote: > >>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >>> bounces at opensolaris.org] On Behalf Of Brian >>> >>> I have a raidz2 pool with one disk that seems to be going bad, several >> errors >>> are noted in iostat. ?I have an RMA for the drive, however - no I am >>> wondering how I proceed. ?I need to send the drive in and then they will >>> send me one back. ?If I had the drive on hand, I could do a zpool replace. >>> >>> Do I do a zpool offline? zpool detach? >>> Once I get the drive back and put it in the same drive bay.. ?Is it just a >> zpool >>> replace <device>? >> >> Just guessing you don''t have hotswap drive bays, because you don''t have an >> advance replacement warranty on your hardware. ?;-) ?Which means you''re >> going to have to shutdown anyway. ?So: >> >> I would zpool export. ?That will ensure drives are all stopped. >> Then I would make the faulted drive blink. ?Something like: >> while true ; do dd if=/dev/rdsk/baddisk of=/dev/null bs=1024k count=8000 ; >> sleep 1 ; done >> >> Make a note of which drive is the bad drive. >> Shutdown. >> Remove it. >> Boot up again. >> zpool import -a >> >> Now you will see the removed drive appearing as "offline" or whatever status >> is most helpful. > > Yuck. ?What an ugly procedure :-( > > zpool offline is a better method.Just zpool offline <pool> <bad disk>?> > Identifying the location of the drive is OS, OS rev, and hardware dependent. > Some OSes are easier than others. For example in NexentaStor, the device > serial numbers are readily displayed, so you can verify the serial number of > the disk. MacOS also provides easy access to the disk serial number for > verification.I have all the disks labelled and got the serial number using prtconf. But I will double check before I ship it off :-)> > When the new disk arrives, do the zpool replace.Ok, I guess there are two scenarios I will encounter. (1) When I put the new drive in it will have the same name, then its just zpool replace <tank_name> <drive_name> (2) It has a different name. zpool replace <tank_name> <old_drive> <new_drive>. Is there any problem with #2 if the old device is gone?> > A detailed method for doing this on Solaris is in the ZFS Admin Guide.I took a look at that, but still had questions. Thanks!> ?-- richard > >
Always pre-purchase one extra drive to have on hand. When you get it, confirm it was not dead-on-arrival by hooking up on an external USB to a workstation and running whatever your favorite tools are to validate it is okay. Then put it back in its original packaging, and put a label on it about what it is, and that it is a spare for box(s) XYZ disk system. When a drive fails, use that one off the shelf to do your replacement immediately then deal with the RMA, paperwork, and snailmail to get the bad drive replaced. Also, depending how many disks you have in your array - keeping multiple spares can be a good idea as well to cover another disk dying while waiting on that replacement one. In my opinion, the above goes whether you have your disk system configured with hot spare or not. And the technique is applicable to both personal/home-use and commercial uses if your data is important. - Mike On May 28, 2011, at 9:30 AM, Brian wrote:> I have a raidz2 pool with one disk that seems to be going bad, several errors are noted in iostat. I have an RMA for the drive, however - no I am wondering how I proceed. I need to send the drive in and then they will send me one back. If I had the drive on hand, I could do a zpool replace. > > Do I do a zpool offline? zpool detach? > Once I get the drive back and put it in the same drive bay.. Is it just a zpool replace <device>? > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
"Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D."
2011-May-28 23:59 UTC
[zfs-discuss] Have my RMA... Now what??
yes good idea, another things to keep in mind technology change so fast, by the time you want a replacement, may be HDD does exist any more or the supplier changed, so the drives are not exactly like your original drive ???? On 5/28/2011 6:05 PM, Michael DeMan wrote:> Always pre-purchase one extra drive to have on hand. When you get it, confirm it was not dead-on-arrival by hooking up on an external USB to a workstation and running whatever your favorite tools are to validate it is okay. Then put it back in its original packaging, and put a label on it about what it is, and that it is a spare for box(s) XYZ disk system. > > When a drive fails, use that one off the shelf to do your replacement immediately then deal with the RMA, paperwork, and snailmail to get the bad drive replaced. > > Also, depending how many disks you have in your array - keeping multiple spares can be a good idea as well to cover another disk dying while waiting on that replacement one. > > In my opinion, the above goes whether you have your disk system configured with hot spare or not. And the technique is applicable to both personal/home-use and commercial uses if your data is important. > > > - Mike > > On May 28, 2011, at 9:30 AM, Brian wrote: > >> I have a raidz2 pool with one disk that seems to be going bad, several errors are noted in iostat. I have an RMA for the drive, however - no I am wondering how I proceed. I need to send the drive in and then they will send me one back. If I had the drive on hand, I could do a zpool replace. >> >> Do I do a zpool offline? zpool detach? >> Once I get the drive back and put it in the same drive bay.. Is it just a zpool replace<device>? >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: laotsao.vcf Type: text/x-vcard Size: 653 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110528/b611bba4/attachment.vcf>
Yes, particularly if you have older drives with 512 sectors and then buy a newer drive that seems the same, but is not, because it has 4k sectors. Looks like it works, and will work, but performance drops. On May 28, 2011, at 4:59 PM, Hung-Sheng Tsao (Lao Tsao ??) Ph.D. wrote:> yes good idea, another things to keep in mind > technology change so fast, by the time you want a replacement, may be HDD does exist any more > or the supplier changed, so the drives are not exactly like your original drive > ???? > > > > On 5/28/2011 6:05 PM, Michael DeMan wrote: >> Always pre-purchase one extra drive to have on hand. When you get it, confirm it was not dead-on-arrival by hooking up on an external USB to a workstation and running whatever your favorite tools are to validate it is okay. Then put it back in its original packaging, and put a label on it about what it is, and that it is a spare for box(s) XYZ disk system. >> >> When a drive fails, use that one off the shelf to do your replacement immediately then deal with the RMA, paperwork, and snailmail to get the bad drive replaced. >> >> Also, depending how many disks you have in your array - keeping multiple spares can be a good idea as well to cover another disk dying while waiting on that replacement one. >> >> In my opinion, the above goes whether you have your disk system configured with hot spare or not. And the technique is applicable to both personal/home-use and commercial uses if your data is important. >> >> >> - Mike >> >> On May 28, 2011, at 9:30 AM, Brian wrote: >> >>> I have a raidz2 pool with one disk that seems to be going bad, several errors are noted in iostat. I have an RMA for the drive, however - no I am wondering how I proceed. I need to send the drive in and then they will send me one back. If I had the drive on hand, I could do a zpool replace. >>> >>> Do I do a zpool offline? zpool detach? >>> Once I get the drive back and put it in the same drive bay.. Is it just a zpool replace<device>? >>> -- >>> This message posted from opensolaris.org >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > <laotsao.vcf>_______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss