I have a pool (on an X4540 running S10U8) in which a disk failed, and the hot spare kicked in. That''s perfect. I''m happy. Then a second disk fails. Now, I''ve replaced the first failed disk, and it''s resilvered and I have my hot spare back. But: why hasn''t it used the spare to cover the other failed drive? And can I hotspare it manually? I could do a straight replace, but that isn''t quite the same thing. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Mar 30, 2010, at 5:39 PM, Peter Tribble wrote:> I have a pool (on an X4540 running S10U8) in which a disk failed, and the > hot spare kicked in. That''s perfect. I''m happy. > > Then a second disk fails. > > Now, I''ve replaced the first failed disk, and it''s resilvered and I have my > hot spare back. > > But: why hasn''t it used the spare to cover the other failed drive? And > can I hotspare it manually? I could do a straight replace, but that > isn''t quite the same thing.Hot spares are only activated in response to a fault received by the zfs-retire FMA agent. There is no notion that the spares should be re-evaluated when they become available at a later point in time. Certainly a reasonable RFE, but not something ZFS does today. You can ''zpool attach'' the spare like a normal device - that''s all that the retire agent is doing under the hood. Hope that helps, - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
On 03/31/10 10:39 AM, Peter Tribble wrote:> I have a pool (on an X4540 running S10U8) in which a disk failed, and the > hot spare kicked in. That''s perfect. I''m happy. > > Then a second disk fails. > > Now, I''ve replaced the first failed disk, and it''s resilvered and I have my > hot spare back. > > But: why hasn''t it used the spare to cover the other failed drive? And > can I hotspare it manually? I could do a straight replace, but that > isn''t quite the same thing. > >Was the spare spare when the second drive failed? If not, I don''t think it will get used. My understanding is the spares are added when the drive is faulted, so it''s an event rather then level driven action. At least I''m not the only one seeing multiple drive failures this week! -- Ian.
> > I have a pool (on an X4540 running S10U8) in which a disk failed, and the > hot spare kicked in. That''s perfect. I''m happy. > > Then a second disk fails. > > Now, I''ve replaced the first failed disk, and it''s resilvered and I have my > hot spare back. > > But: why hasn''t it used the spare to cover the other failed drive? And > can I hotspare it manually? I could do a straight replace, but that > isn''t quite the same thing. > >It seems like it is even driven. Hmmm.. perhaps it shouldn''t be. Anyway you can do zpool replace and it is the same thing, why wouldn''t it? -- Robert Milkowski http://milek.blogspot.com
On Tue, Mar 30, 2010 at 10:42 PM, Eric Schrock <Eric.Schrock at oracle.com> wrote:> > On Mar 30, 2010, at 5:39 PM, Peter Tribble wrote: > >> I have a pool (on an X4540 running S10U8) in which a disk failed, and the >> hot spare kicked in. That''s perfect. I''m happy. >> >> Then a second disk fails. >> >> Now, I''ve replaced the first failed disk, and it''s resilvered and I have my >> hot spare back. >> >> But: why hasn''t it used the spare to cover the other failed drive? And >> can I hotspare it manually? ?I could do a straight replace, but that >> isn''t quite the same thing. > > Hot spares are only activated in response to a fault received by the zfs-retire FMA agent. ?There is no notion that the spares should be re-evaluated when they become available at a later point in time. ?Certainly a reasonable RFE, but not something ZFS does today.Definitely an RFE I would like.> You can ''zpool attach'' the spare like a normal device - that''s all that the retire agent is doing under the hood.So, given: NAME STATE READ WRITE CKSUM images DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c2t0d0 FAULTED 4 0 0 too many errors c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 spares c5t7d0 AVAIL then it would be this? zpool attach images c2t0d0 c5t7d0 which I had considered, but the man page for attach says "The existing device cannot be part of a raidz configuration." If I try that it fails, saying: "/invalid vdev specification use ''-f'' to override the following errors: dev/dsk/c5t7d0s0 is reserved as a hot spare for ZFS pool images. Please see zpool(1M)." Thanks! -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
> > On Tue, Mar 30, 2010 at 10:42 PM, Eric Schrock<Eric.Schrock at oracle.com> wrote: > >> On Mar 30, 2010, at 5:39 PM, Peter Tribble wrote: >> >> >>> I have a pool (on an X4540 running S10U8) in which a disk failed, and the >>> hot spare kicked in. That''s perfect. I''m happy. >>> >>> Then a second disk fails. >>> >>> Now, I''ve replaced the first failed disk, and it''s resilvered and I have my >>> hot spare back. >>> >>> But: why hasn''t it used the spare to cover the other failed drive? And >>> can I hotspare it manually? I could do a straight replace, but that >>> isn''t quite the same thing. >>> >> Hot spares are only activated in response to a fault received by the zfs-retire FMA agent. There is no notion that the spares should be re-evaluated when they become available at a later point in time. Certainly a reasonable RFE, but not something ZFS does today. >> > Definitely an RFE I would like. > > >> You can ''zpool attach'' the spare like a normal device - that''s all that the retire agent is doing under the hood. >> > So, given: > > NAME STATE READ WRITE CKSUM > images DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > c2t0d0 FAULTED 4 0 0 too many errors > c3t0d0 ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > c5t0d0 ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c3t1d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > spares > c5t7d0 AVAIL > > then it would be this? > > zpool attach images c2t0d0 c5t7d0 > > which I had considered, but the man page for attach says "The > existing device cannot be part of a raidz configuration." > > If I try that it fails, saying: > "/invalid vdev specification > use ''-f'' to override the following errors: > dev/dsk/c5t7d0s0 is reserved as a hot spare for ZFS pool images. > Please see zpool(1M)." > > Thanks! > >You need to use zpool replace. Once you fix the failed drive and it re-synchronizes a hot spare will detach automatically (regardless if you forced it to kick-in via zpool replace or if it did so due to FMA). For more details see http://blogs.sun.com/eschrock/entry/zfs_hot_spares -- Robert Milkowski http://milek.blogspot.com
On 03/31/10 10:54 PM, Peter Tribble wrote:> On Tue, Mar 30, 2010 at 10:42 PM, Eric Schrock<Eric.Schrock at oracle.com> wrote: > >> On Mar 30, 2010, at 5:39 PM, Peter Tribble wrote: >> >> >>> I have a pool (on an X4540 running S10U8) in which a disk failed, and the >>> hot spare kicked in. That''s perfect. I''m happy. >>> >>> Then a second disk fails. >>> >>> Now, I''ve replaced the first failed disk, and it''s resilvered and I have my >>> hot spare back. >>> >>> But: why hasn''t it used the spare to cover the other failed drive? And >>> can I hotspare it manually? I could do a straight replace, but that >>> isn''t quite the same thing. >>> >> Hot spares are only activated in response to a fault received by the zfs-retire FMA agent. There is no notion that the spares should be re-evaluated when they become available at a later point in time. Certainly a reasonable RFE, but not something ZFS does today. >> > Definitely an RFE I would like. > > >> You can ''zpool attach'' the spare like a normal device - that''s all that the retire agent is doing under the hood. >> > So, given: > > NAME STATE READ WRITE CKSUM > images DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > c2t0d0 FAULTED 4 0 0 too many errors > c3t0d0 ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > c5t0d0 ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c2t1d0 ONLINE 0 0 0 > c3t1d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > spares > c5t7d0 AVAIL > > then it would be this? > > zpool attach images c2t0d0 c5t7d0 > > which I had considered, but the man page for attach says "The > existing device cannot be part of a raidz configuration." > > If I try that it fails, saying: > "/invalid vdev specification > use ''-f'' to override the following errors: > dev/dsk/c5t7d0s0 is reserved as a hot spare for ZFS pool images. > Please see zpool(1M)." >What happens if you remove it as a spare first? -- Ian.