Hi all I have installed a new server with 77 2TB drives in 11 7-drive RAIDz2 VDEVs, all on WD Black drives. Now, it seems two of these drives were bad, one of them had a bunch of errors, the other was very slow. After zfs offlining these and then zfs replacing them with online spares, resilver ended and I thought it''d be ok. Appearently not. Albeit the resilver succeeds, the pool status is still degraded. A test with iozone also shows that the two degraded VDEVs are not used (much) during the test. See below for zpool -xv output. I have done a few test on another system, and it showed, with raidz2 with a spare, removing one drive, waiting for it to resilver, same degraded status. zpool clear doesn''t help. Removing another drive (not the spare) leaves it in the same degraded status. Removing a third (which should work, since the spare is in action) faults the pool. Can someone help me how to fix this, or should I file a bug about this? roy root at prv-backup:~# zpool status -xv pool: pbpool state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using ''zpool online'' or replace the device with ''zpool replace''. scan: resilvered 385M in 0h12m with 0 errors on Sun Dec 5 21:06:38 2010 config: NAME STATE READ WRITE CKSUM pbpool DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 c8t5d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c8t8d0 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 c8t9d0 ONLINE 0 0 0 c8t10d0 ONLINE 0 0 0 c8t11d0 ONLINE 0 0 0 c8t12d0 ONLINE 0 0 0 spare-4 DEGRADED 0 0 0 c8t13d0 OFFLINE 0 0 0 c4t43d0 ONLINE 0 0 0 c8t14d0 ONLINE 0 0 0 c8t15d0 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c8t16d0 ONLINE 0 0 0 c8t17d0 ONLINE 0 0 0 c8t18d0 ONLINE 0 0 0 c8t19d0 ONLINE 0 0 0 c8t20d0 ONLINE 0 0 0 c8t21d0 ONLINE 0 0 0 c8t22d0 ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 c8t23d0 ONLINE 0 0 0 c8t24d0 ONLINE 0 0 0 c8t25d0 ONLINE 0 0 0 c8t26d0 ONLINE 0 0 0 c8t27d0 ONLINE 0 0 0 c8t28d0 ONLINE 0 0 0 c8t29d0 ONLINE 0 0 0 raidz2-4 ONLINE 0 0 0 c8t30d0 ONLINE 0 0 0 c8t31d0 ONLINE 0 0 0 c8t32d0 ONLINE 0 0 0 c8t33d0 ONLINE 0 0 0 c8t34d0 ONLINE 0 0 0 c8t35d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 raidz2-5 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 raidz2-6 ONLINE 0 0 0 c4t8d0 ONLINE 0 0 0 c4t9d0 ONLINE 0 0 0 c4t10d0 ONLINE 0 0 0 c4t11d0 ONLINE 0 0 0 c4t12d0 ONLINE 0 0 0 c4t13d0 ONLINE 0 0 0 c4t14d0 ONLINE 0 0 0 raidz2-7 DEGRADED 0 0 0 c4t15d0 ONLINE 0 0 0 c4t16d0 ONLINE 0 0 0 spare-2 DEGRADED 0 0 0 c4t17d0 OFFLINE 0 0 0 c4t44d0 ONLINE 0 0 0 c4t18d0 ONLINE 0 0 0 c4t19d0 ONLINE 0 0 0 c4t20d0 ONLINE 0 0 0 c4t21d0 ONLINE 0 0 0 raidz2-8 ONLINE 0 0 0 c4t22d0 ONLINE 0 0 0 c4t23d0 ONLINE 0 0 0 c4t24d0 ONLINE 0 0 0 c4t25d0 ONLINE 0 0 0 c4t26d0 ONLINE 0 0 0 c4t27d0 ONLINE 0 0 0 c4t28d0 ONLINE 0 0 0 raidz2-9 ONLINE 0 0 0 c4t29d0 ONLINE 0 0 0 c4t30d0 ONLINE 0 0 0 c4t31d0 ONLINE 0 0 0 c4t32d0 ONLINE 0 0 0 c4t33d0 ONLINE 0 0 0 c4t34d0 ONLINE 0 0 0 c4t35d0 ONLINE 0 0 0 raidz2-10 ONLINE 0 0 0 c4t36d0 ONLINE 0 0 0 c4t37d0 ONLINE 0 0 0 c4t38d0 ONLINE 0 0 0 c4t39d0 ONLINE 0 0 0 c4t40d0 ONLINE 0 0 0 c4t41d0 ONLINE 0 0 0 c4t42d0 ONLINE 0 0 0 cache c8t0d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 spares c4t43d0 INUSE currently in use c4t44d0 INUSE currently in use errors: No known data errors root at prv-backup:~# -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On Sun, Dec 5, 2010 at 2:22 PM, Roy Sigurd Karlsbakk <roy at karlsbakk.net>wrote:> Hi all > > I have installed a new server with 77 2TB drives in 11 7-drive RAIDz2 > VDEVs, all on WD Black drives. Now, it seems two of these drives were bad, > one of them had a bunch of errors, the other was very slow. After zfs > offlining these and then zfs replacing them with online spares, resilver > ended and I thought it''d be ok. Appearently not. Albeit the resilver > succeeds, the pool status is still degraded. A test with iozone also shows > that the two degraded VDEVs are not used (much) during the test. See below > for zpool -xv output. > > I have done a few test on another system, and it showed, with raidz2 with a > spare, removing one drive, waiting for it to resilver, same degraded status. > zpool clear doesn''t help. Removing another drive (not the spare) leaves it > in the same degraded status. Removing a third (which should work, since the > spare is in action) faults the pool. > > Can someone help me how to fix this, or should I file a bug about this? > > roy > > root at prv-backup:~# zpool status -xv > pool: pbpool > state: DEGRADED > status: One or more devices has been taken offline by the administrator. > Sufficient replicas exist for the pool to continue functioning in a > degraded state. > action: Online the device using ''zpool online'' or replace the device with > ''zpool replace''. > scan: resilvered 385M in 0h12m with 0 errors on Sun Dec 5 21:06:38 2010 > config: > > NAME STATE READ WRITE CKSUM > pbpool DEGRADED 0 0 0 > raidz2-0 ONLINE 0 0 0 > c8t2d0 ONLINE 0 0 0 > c8t3d0 ONLINE 0 0 0 > c8t4d0 ONLINE 0 0 0 > c8t5d0 ONLINE 0 0 0 > c8t6d0 ONLINE 0 0 0 > c8t7d0 ONLINE 0 0 0 > c8t8d0 ONLINE 0 0 0 > raidz2-1 DEGRADED 0 0 0 > c8t9d0 ONLINE 0 0 0 > c8t10d0 ONLINE 0 0 0 > c8t11d0 ONLINE 0 0 0 > c8t12d0 ONLINE 0 0 0 > spare-4 DEGRADED 0 0 0 > c8t13d0 OFFLINE 0 0 0 > c4t43d0 ONLINE 0 0 0 > c8t14d0 ONLINE 0 0 0 > c8t15d0 ONLINE 0 0 0 > raidz2-2 ONLINE 0 0 0 > c8t16d0 ONLINE 0 0 0 > c8t17d0 ONLINE 0 0 0 > c8t18d0 ONLINE 0 0 0 > c8t19d0 ONLINE 0 0 0 > c8t20d0 ONLINE 0 0 0 > c8t21d0 ONLINE 0 0 0 > c8t22d0 ONLINE 0 0 0 > raidz2-3 ONLINE 0 0 0 > c8t23d0 ONLINE 0 0 0 > c8t24d0 ONLINE 0 0 0 > c8t25d0 ONLINE 0 0 0 > c8t26d0 ONLINE 0 0 0 > c8t27d0 ONLINE 0 0 0 > c8t28d0 ONLINE 0 0 0 > c8t29d0 ONLINE 0 0 0 > raidz2-4 ONLINE 0 0 0 > c8t30d0 ONLINE 0 0 0 > c8t31d0 ONLINE 0 0 0 > c8t32d0 ONLINE 0 0 0 > c8t33d0 ONLINE 0 0 0 > c8t34d0 ONLINE 0 0 0 > c8t35d0 ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > raidz2-5 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c4t2d0 ONLINE 0 0 0 > c4t3d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > c4t5d0 ONLINE 0 0 0 > c4t6d0 ONLINE 0 0 0 > c4t7d0 ONLINE 0 0 0 > raidz2-6 ONLINE 0 0 0 > c4t8d0 ONLINE 0 0 0 > c4t9d0 ONLINE 0 0 0 > c4t10d0 ONLINE 0 0 0 > c4t11d0 ONLINE 0 0 0 > c4t12d0 ONLINE 0 0 0 > c4t13d0 ONLINE 0 0 0 > c4t14d0 ONLINE 0 0 0 > raidz2-7 DEGRADED 0 0 0 > c4t15d0 ONLINE 0 0 0 > c4t16d0 ONLINE 0 0 0 > spare-2 DEGRADED 0 0 0 > c4t17d0 OFFLINE 0 0 0 > c4t44d0 ONLINE 0 0 0 > c4t18d0 ONLINE 0 0 0 > c4t19d0 ONLINE 0 0 0 > c4t20d0 ONLINE 0 0 0 > c4t21d0 ONLINE 0 0 0 > raidz2-8 ONLINE 0 0 0 > c4t22d0 ONLINE 0 0 0 > c4t23d0 ONLINE 0 0 0 > c4t24d0 ONLINE 0 0 0 > c4t25d0 ONLINE 0 0 0 > c4t26d0 ONLINE 0 0 0 > c4t27d0 ONLINE 0 0 0 > c4t28d0 ONLINE 0 0 0 > raidz2-9 ONLINE 0 0 0 > c4t29d0 ONLINE 0 0 0 > c4t30d0 ONLINE 0 0 0 > c4t31d0 ONLINE 0 0 0 > c4t32d0 ONLINE 0 0 0 > c4t33d0 ONLINE 0 0 0 > c4t34d0 ONLINE 0 0 0 > c4t35d0 ONLINE 0 0 0 > raidz2-10 ONLINE 0 0 0 > c4t36d0 ONLINE 0 0 0 > c4t37d0 ONLINE 0 0 0 > c4t38d0 ONLINE 0 0 0 > c4t39d0 ONLINE 0 0 0 > c4t40d0 ONLINE 0 0 0 > c4t41d0 ONLINE 0 0 0 > c4t42d0 ONLINE 0 0 0 > cache > c8t0d0 ONLINE 0 0 0 > c8t1d0 ONLINE 0 0 0 > spares > c4t43d0 INUSE currently in use > c4t44d0 INUSE currently in use > > errors: No known data errors > root at prv-backup:~# > > >Hot spares are dedicated spares in the ZFS world. Until you replace the actual bad drives, you will be running in a degraded state. The idea is that spares are only used in an emergency. You are degraded until your spares are no longer in use. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101205/ed9b11c0/attachment-0001.html>
> Hot spares are dedicated spares in the ZFS world. Until you replace > the actual bad drives, you will be running in a degraded state. The > idea is that spares are only used in an emergency. You are degraded > until your spares are no longer in use. > > --TimThanks for the clarification. Wouldn''t it be nice if ZFS could fail over to a spare and then allow the replacement as the new spare, as with what is done with most commercial "hardware" RAIDs? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On 5 Dec 2010, at 16:06, Roy Sigurd Karlsbakk <roy at karlsbakk.net> wrote:>> Hot spares are dedicated spares in the ZFS world. Until you replace >> the actual bad drives, you will be running in a degraded state. The >> idea is that spares are only used in an emergency. You are degraded >> until your spares are no longer in use. >> >> --Tim > > Thanks for the clarification. Wouldn''t it be nice if ZFS could fail over > to a spare and then allow the replacement as the new spare, as with what > is done with most commercial "hardware" RAIDs?If you use "zpool detach" to remove the disk that went bad, the spare is promoted to a proper member of the pool. Then, when you replace the bad disk, you can use "zpool add" to add it into the pool as a new spare. Admittedly, this is all a manual procedure. It''s unclear if you were asking for this to be fully automated.> > Vennlige hilsener / Best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 97542685 > roy at karlsbakk.net > http://blogg.karlsbakk.net/ > -- > I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > Thanks for the clarification. Wouldn''t it be nice if ZFS could fail > > over > > to a spare and then allow the replacement as the new spare, as with > > what > > is done with most commercial "hardware" RAIDs? > > If you use "zpool detach" to remove the disk that went bad, the spare > is promoted to a proper member of the pool. Then, when you replace the > bad disk, you can use "zpool add" to add it into the pool as a new > spare. > > Admittedly, this is all a manual procedure. It''s unclear if you were > asking for this to be fully automated.Thanks a bunch. I wasn''t aware of the possibility of using detach except for mirrors (as I beleive the manual states). Just tried to detach the two bad devices, and the pool is back to ONLINE. I''ll restart the iozone testing to see if all VDEVs are used this time :) Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.