Paul Bruce
2009-Dec-12 14:17 UTC
[zfs-discuss] ZFS - how to determine which physical drive to replace
Hi, I''m just about to build a ZFS system as a home file server in raidz, but I have one question - pre-empting the need to replace one of the drives if it ever fails. How on earth do you determine the actual physical drive that has failed ? I''ve got the while zpool status thing worked out, but how do I translate the c1t0d0, c1t0d1 etc.. to a real physical driver. I can just see myself looking at the 6 drives, and thinking "mmmm..... c1t0d1.... i think that''s *this* one"...... einee menee minee moe P -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091213/4171057d/attachment.html>
Edward Ned Harvey
2009-Dec-12 15:58 UTC
[zfs-discuss] ZFS - how to determine which physical drive to replace
This is especially important, because if you have 1 failed drive, and you pull the wrong drive, now you have 2 failed drives. And that could destroy the dataset (depending on whether you have raidz-1 or raidz-2) Whenever possible, always get the hotswappable hardware, that will blink a red light for you, so there can be no mistake. Even if the hardware doesn''t blink a light for you, you could manually cycle between activity and non-activity on the disks, to identify the disk yourself . But if that''s not a possibility . if you have no lights on non-hotswappable disks . then . Given you''re going to have to power off the system. Given it''s difficult to map the device name to physical wire. I would suggest something like this: While the system is still on, if the failed drive is at least writable *a little bit* . then you can "dd if=/dev/zero of=/dev/rdsk/FailedDiskDevice bs=1024 count=1024" . and then after the system is off, you could plug the drives into another system one-by-one, and read the first 1M, and see if it''s all zeros. (Or instead of dd zero, you could echo some text onto the drive, or whatever you think is easiest.) Obviously that''s not necessarily an option. If the drive is completely dead, totally unwritable, then when you plug the drives one-by-one into another system, it should be easy to identify the failed drive. From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Paul Bruce Sent: Saturday, December 12, 2009 9:18 AM To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] ZFS - how to determine which physical drive to replace Hi, I''m just about to build a ZFS system as a home file server in raidz, but I have one question - pre-empting the need to replace one of the drives if it ever fails. How on earth do you determine the actual physical drive that has failed ? I''ve got the while zpool status thing worked out, but how do I translate the c1t0d0, c1t0d1 etc.. to a real physical driver. I can just see myself looking at the 6 drives, and thinking "mmmm..... c1t0d1.... i think that''s *this* one"...... einee menee minee moe P -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091212/6955fba2/attachment.html>
Ed Plese
2009-Dec-12 16:26 UTC
[zfs-discuss] ZFS - how to determine which physical drive to replace
On Sat, Dec 12, 2009 at 8:17 AM, Paul Bruce <paul at cais.com.au> wrote:> Hi, > I''m just about to build a ZFS system as a home file server in raidz, but I > have one question - pre-empting the need to replace one of the drives if it > ever fails. > How on earth do you determine the actual physical drive that has failed ? > I''ve got the while zpool status thing worked out, but how do I translate > the?c1t0d0, c1t0d1 etc.. to a real physical driver. > I can just see myself looking at the 6 drives, and thinking "mmmm..... > ?c1t0d1.... i think that''s *this* one"...... einee menee minee moe > PAs suggested at http://opensolaris.org/jive/thread.jspa?messageID=416264, you can try viewing the disk serial numbers with cfgadm: cfgadm -al -s "select=type(disk),cols=ap_id:info" You may need to power down the system to view the serial numbers printed on the disks to match them up, but it beats guessing. Ed Plese
Patrick O''Sullivan
2009-Dec-12 17:21 UTC
[zfs-discuss] ZFS - how to determine which physical drive to replace
I''ve found that when I build a system, it''s worth the initial effort to install drives one by one to see how they get mapped to names. Then I put labels on the drives and SATA cables. If there were room to label the actual SATA ports on the motherboard and cards, I would. While this isn''t foolproof, it gives me a bit more reassurance in the [inevitable] event of a drive failure. On Sat, Dec 12, 2009 at 9:17 AM, Paul Bruce <paul at cais.com.au> wrote:> Hi, > I''m just about to build a ZFS system as a home file server in raidz, but I > have one question - pre-empting the need to replace one of the drives if it > ever fails. > How on earth do you determine the actual physical drive that has failed ? > I''ve got the while zpool status thing worked out, but how do I translate > the?c1t0d0, c1t0d1 etc.. to a real physical driver. > I can just see myself looking at the 6 drives, and thinking "mmmm..... > ?c1t0d1.... i think that''s *this* one"...... einee menee minee moe > P > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >
Mike Gerdts
2009-Dec-12 19:46 UTC
[zfs-discuss] ZFS - how to determine which physical drive to replace
On Sat, Dec 12, 2009 at 9:58 AM, Edward Ned Harvey <solaris at nedharvey.com> wrote:> I would suggest something like this:? While the system is still on, if the > failed drive is at least writable *a little bit* ? then you can ?dd > if=/dev/zero of=/dev/rdsk/FailedDiskDevice bs=1024 count=1024? ? and then > after the system is off, you could plug the drives into another system > one-by-one, and read the first 1M, and see if it?s all zeros.?? (Or instead > of dd zero, you could echo some text onto the drive, or whatever you think > is easiest.) >How about reading instead? dd if=/dev/rdsk/$whatever of=/dev/null If the failed disk generates I/O errors that prevent it from reading at a rate that causes an LED to blink, you could read from all of the good disks. The one that doesn''t blink is the broken one. You can also get the drive serial number with iostat -En: $ iostat -En c3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Model: Hitachi HTS5425 Revision: Serial No: 080804BB6300HCG Size: 160.04GB <160039305216 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 ... That /should/ be printed on the disk somewhere. -- Mike Gerdts http://mgerdts.blogspot.com/