Hello, In OpenSolaris b111 with autoreplace=on and a pool without spares, ZFS is not kicking the resilver after a faulty disk is replaced and shows up with the same device name, even after waiting several minutes. The solution is to do a manual `zpool replace` which returns the following: # zpool replace tank c3t17d0 invalid vdev specification use ''-f'' to override the following errors: /dev/dsk/c3t17d0s0 is part of active ZFS pool tank. Please see zpool(1M). ... and resilvering starts immediately. Looks like the `zpool replace` kicked in the autoreplace function. Since b111 is running a little old there is a chance this has already been reported and fixed. Does anyone know anything about it ? Also, if autoreplace is on and the pool has spares, when a disk fails the spare is automatically used (works fine) but when the faulty disk is replaced.. nothing really happens. Was the autoreplace code supposed to replace the faulty disk and release the spare when resilver is done ? Thank you, -- Giovanni Tirloni gtirloni at sysdroid.com
Hi Giovanni, The spare behavior and the autoreplace property behavior are separate but they should work pretty well in recent builds. You should not need to perform a zpool replace operation if the autoreplace property is set. If autoreplace is set and a replacement disk is inserted into the same physical location of the removed failed disk, then a new disk label is applied to the replacement disk and ZFS should recognize it. Let the replacement disk resilver from the spare. When the resilver completes, the spare should detach automatically. We saw this happen on a disk replacement last week on a system running a recent Nevada build. If the spare doesn''t detach after the resilver is complete, then just detach it manually. Thanks, Cindy On 08/11/10 10:52, Giovanni Tirloni wrote:> Hello, > > In OpenSolaris b111 with autoreplace=on and a pool without spares, > ZFS is not kicking the resilver after a faulty disk is replaced and > shows up with the same device name, even after waiting several > minutes. The solution is to do a manual `zpool replace` which returns > the following: > > # zpool replace tank c3t17d0 > invalid vdev specification > use ''-f'' to override the following errors: > /dev/dsk/c3t17d0s0 is part of active ZFS pool tank. Please see zpool(1M). > > ... and resilvering starts immediately. Looks like the `zpool > replace` kicked in the autoreplace function. > > Since b111 is running a little old there is a chance this has already > been reported and fixed. Does anyone know anything about it ? > > Also, if autoreplace is on and the pool has spares, when a disk fails > the spare is automatically used (works fine) but when the faulty disk > is replaced.. nothing really happens. Was the autoreplace code > supposed to replace the faulty disk and release the spare when > resilver is done ? > > Thank you, >
On Wed, Aug 11, 2010 at 4:06 PM, Cindy Swearingen <cindy.swearingen at oracle.com> wrote:> Hi Giovanni, > > The spare behavior and the autoreplace property behavior are separate > but they should work pretty well in recent builds. > > You should not need to perform a zpool replace operation if the > autoreplace property is set. If autoreplace is set and a replacement > disk is inserted into the same physical location of the removed > failed disk, then a new disk label is applied to the replacement > disk and ZFS should recognize it.That''s what I''m having to do in b111. I will try to simulate the same situation in b134.> > Let the replacement disk resilver from the spare. When the resilver > completes, the spare should detach automatically. We saw this happen on > a disk replacement last week on a system running a recent Nevada build. > > If the spare doesn''t detach after the resilver is complete, then just > detach it manually.Yes, that''s working as expected (spare detaches after resilver). -- Giovanni Tirloni gtirloni at sysdroid.com