Dustin Marquess
2009-Sep-23 15:33 UTC
[zfs-discuss] RAID-Z2 won''t come online after replacing failed disk
I replaced a bad disk in a RAID-Z2 pool, and now the pool won''t come online. Status shows nothing helpful at all. I don''t understand why this is, which I should be able to lose 2 drives, and I only replaced one! # zpool status -v pool pool: pool state: UNAVAIL scrub: none requested config: NAME STATE READ WRITE CKSUM pool UNAVAIL 0 0 0 insufficient replicas raidz2 UNAVAIL 0 0 0 corrupted data c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 -- This message posted from opensolaris.org
Dustin Marquess
2009-Sep-23 15:57 UTC
[zfs-discuss] RAID-Z2 won''t come online after replacing failed disk
Okay.. I "fixed" it by powering the server off, removing the new drive, letting the pool come up degraded, and then doing zpool replace. I''m assuming what happened was ZFS saw that the disk was online, tried to use it, and then noticed that the checksums didn''t match (of course) and marked the pool as corrupted. The question is why didn''t ZFS check the labels on the drive and see that the drive wasn''t in the pool and kick it out itself? -- This message posted from opensolaris.org
Tim Cook
2009-Sep-23 16:08 UTC
[zfs-discuss] RAID-Z2 won''t come online after replacing failed disk
On Wed, Sep 23, 2009 at 10:57 AM, Dustin Marquess <jailbird at alcatraz.fdf.net> wrote:> Okay.. I "fixed" it by powering the server off, removing the new drive, > letting the pool come up degraded, and then doing zpool replace. > > I''m assuming what happened was ZFS saw that the disk was online, tried to > use it, and then noticed that the checksums didn''t match (of course) and > marked the pool as corrupted. The question is why didn''t ZFS check the > labels on the drive and see that the drive wasn''t in the pool and kick it > out itself? > -- >Did you do a zpool scrub after you replaced the drive? How would zfs know what you wanted done with the drive if you didn''t tell it? --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090923/592fd4d4/attachment.html>
Bob Friesenhahn
2009-Sep-23 16:10 UTC
[zfs-discuss] RAID-Z2 won''t come online after replacing failed disk
On Wed, 23 Sep 2009, Dustin Marquess wrote:> Okay.. I "fixed" it by powering the server off, removing the new drive, letting the pool come up degraded, and then doing zpool replace. > > I''m assuming what happened was ZFS saw that the disk was online, > tried to use it, and then noticed that the checksums didn''t match > (of course) and marked the pool as corrupted. The question is why > didn''t ZFS check the labels on the drive and see that the drive > wasn''t in the pool and kick it out itself?You never told us what OS and version (OpenSolaris, Solaris 10, FreeBSD, NetBSD, Linux Fuse, OS X zfs preview) you are using. If you are using an older version of zfs, maybe a newer version works as expected? Never report a problem without identifying the software and hardware you are using. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Dustin Marquess
2009-Sep-23 17:23 UTC
[zfs-discuss] RAID-Z2 won''t come online after replacing failed disk
Tim: I couldn''t do a zpool scrub, since the pool was marked as UNAVAIL. Believe me, I tried :) Bob: Ya, I realized that after I clicked send. My brain was a little frazzled, so I completely overlooked it. Solaris 10u7 - Sun E450 ZFS pool version 10 ZFS filesystem version 3 -Dustin -- This message posted from opensolaris.org
Cindy Swearingen
2009-Sep-23 17:50 UTC
[zfs-discuss] RAID-Z2 won''t come online after replacing failed disk
Dustin, You didn''t describe the process that you used to replace the disk so its difficult to commment on what happened. In general, you physically replace the disk and then let ZFS know that the disk is replaced, like this: # zpool replace pool-name device-name This process is described here: http://docs.sun.com/app/docs/doc/819-5461/gazgd?a=view If you want to reduce the steps in the future, you can enable the autoreplace property on the pool and all you need to do is physically replace the disks in the pool. Cindy On 09/23/09 11:23, Dustin Marquess wrote:> Tim: I couldn''t do a zpool scrub, since the pool was marked as UNAVAIL. Believe me, I tried :) > > Bob: Ya, I realized that after I clicked send. My brain was a little frazzled, so I completely overlooked it. > > Solaris 10u7 - Sun E450 > ZFS pool version 10 > ZFS filesystem version 3 > > -Dustin
Dustin Marquess
2009-Sep-23 18:39 UTC
[zfs-discuss] RAID-Z2 won''t come online after replacing failed disk
Cindy: AWESOME! Didn''t know about that property, I''ll make sure I set it :). All I did to replace the drives was to power off the machine (the failed drive had hard-locked the SCSI bus, so I had to anyways). Once the machine was powered off, I pulled the bad drive, inserted the new drive, and powered the machine on. That''s when the machine came up showing the pool in a corrupted state. I''m assuming if I had removed the old drive, booted it with the drive missing, let it come up DEGRADED, and then inserted the new drive and did a zpool replace, it would have been fine. So I was going by the guess that zpool didn''t know that the disk was replaced, and I was just curious why. -- This message posted from opensolaris.org