Below is an updated version of the previous hot spare proposal. Only a few things have been tweaked based on feedback received: * Clarified that spares can be applied to RAID-Z and mirrored devices * Made the state of the ''spare'' device is always DEGRADED * Used the terms ''AVAILABLE'' and ''INUSE'' to describe hot spares I hope to submit this case for review this week, so if you have any further suggestions let me know. - Eric A. DESCRIPTION ZFS, as an integrated volume manager and filesystem, has the ability to replace disks within an active pool. This allows administrators to replace failing or faulted drives to keep the system functioning with the required level of replication. Most other volume managers also support the ability to perform this replacement automatically through the use of "hot spares". This case will add this functionality to ZFS. This case will increment the on-disk version number in accordance to PSARC 2006/206, as the resulting labels introduce a new pool state that older pools will not understand, and exported pools containing hot spares will not be importable on earlier versions. B. POOL MANAGEMENT Hot spares are stored with each pool, although they can be overlapped between different pools. This allows administrators to reserve system-wide hot spares, as well as per-pool hot spares according to their policies. 1. Creating a pool with hot spares A pool can be created with hot spares by using the new ''spare'' vdev: # zpool create test mirror c0d0 c1d0 spare c2d0 c3d0 This will create a pool with a single mirror and two spares. Only a single ''spare'' vdev can be specified, though it can appear anywhere within the command line. The resulting pool looks like the following: # zpool status pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 c0d0 ONLINE 0 0 0 c1d0 ONLINE 0 0 0 spares c2d0 AVAILABLE c3d0 AVAILABLE errors: No known data errors 2. Adding hot spares to a pool Hot spares can be added to a pool in the same manner by using ''zpool add'': # zpool add test spare c4d0 c5d0 This will add two disks to the set of available spares in the pool. 3. Removing hot spares from a pool Hot spares can be removed from a pool with the new ''zpool remove'' subcommand. This subcommand suggests the ability to remove arbitrary devices, and certainly is a feature that will be supported in a future release, but currently this will only allow removing hot spares. For example: # zpool remove test c2d0 If the hot spare is currently spared in, then the command will print an error and exit. 4. Activating a hot spare Hot spares can be used for replacement just like any other device using ''zpool replace''. Both mirrored and RAID-Z devices can be replaced with a hot spare. Even unreplicated devices can be replaced with a hot spare through predictive failure analysis, but the usefulness of such a configuration is questionable. If ZFS detects that the device is a hot spare within the same pool, then it will create a ''spare'' vdev instead of a ''replacing'' vdev: # zpool replace test c0d0 c2d0 # zpool status ... config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror DEGRADED 0 0 0 spare DEGRADED 0 0 0 c0d0 FAULTED 0 0 0 c2d0 ONLINE 0 0 0 35.5K resilvered c1d0 ONLINE 0 0 0 spares c2d0 INUSE by current pool c3d0 AVAILABLE The difference between a ''replacing'' and ''spare'' vdev is that the former automatically removes the original drive once the replace completes. With spares, the vdev remains until the original device is removed from the system, at which point the hot spare is returned to the pool of available spares. Note that in this example we have replaced an online device. Under normal circumstances, the device in question would be faulted or the administrator would have proactively offlined the device. The state of a ''spare'' vdev is always DEGRADED. When a spare is in use, its status is displaed as INUSE instead of AVAILABLE. If the device is spared in the current pool, it will display "by current pool". If the spare is shared between multiple pools, and it is in use in another pool, it will display "by pool ''foo''". 5. Deactivating a hot spare There are 3 ways in which a hot spare can be deactivated: cancelling the hot spare, replacing the original drive, or permanently swapping in the hot spare. To cancel a hot spare attempt, the user can simply ''zpool detach'' the hot spare in question, at which point it will be returned to the set of available spares, the the original drive will remain in its current position (faulted or not): # zpool detach test c2d0 # zpool status ... config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 c0d0 ONLINE 0 0 0 35.5K resilvered c1d0 ONLINE 0 0 0 spares c2d0 AVAILABLE c3d0 AVAILABLE If the original device is replaced, then the spare is automatically removed once the replace completes: # zpool replace test c0d0 c4d0 # zpool status ... config: NAME STATE READ WRITE CKSUM test DEGRADED 0 0 0 mirror DEGRADED 0 0 0 spare DEGRADED 0 0 0 replacing ONLINE 0 0 0 c0d0 FAULTED 0 0 0 c4d0 ONLINE 0 0 0 38K resilvered c2d0 ONLINE 0 0 0 38K resilvered c1d0 ONLINE 0 0 0 spares c2d0 INUSE by current pool c3d0 AVAILABLE <wait for replace to complete> # zpool status ... config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 c4d0 ONLINE 0 0 0 35.5K resilvered c1d0 ONLINE 0 0 0 spares c2d0 AVAILABLE c3d0 AVAILABLE If the user instead wants the hot spare to permanently assume the place of the original device, the original device can be removed with ''zpool detach''. At this point the hot spare will become a functioning device, and automatically be removed from the list of available hot spares: # zpool detach test c0d0 # zpool status ... config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 mirror ONLINE 0 0 0 c2d0 ONLINE 0 0 0 35K resilvered c1d0 ONLINE 0 0 0 spares c3d0 AVAILABLE 6. Determining device usage A hot spare is considered ''in use'' for the purpose of libdiskmgt and zpool(1M) if it is labelled as a spare and is currently in one or more pool''s list of active spares. If a spare is part of an exported pool, it is not considered in use, due largely to the fact that distinguishing this case from a recently destroyed pool is difficult and not solvable in the general case. C. AUTOMATED REPLACEMENT In order to perform automated replacement, a ZFS FMA agent will be added that subscribes to ''fault.zfs.vdev.*'' faults. When a fault is received, the agent will examine the pool to see if it has any available hot spares. If so, it will perform a ''zpool replace'' with an available spare. The initial algorithm for this will be ''first come, first serve'', which may not be ideal for all circumstances (such as when not all spares are the same size). It is anticipated that these circumstances will be rare, and that the algorithm can be improved in the future. This is currently limited by the fact that the ZFS diagnosis engine only emits faults when a device has disappeared from the system. When the DE is enhanced to proactively fault drives based on error rates, then the agent will automaticaly leverage this feature. In addition, note that there is no automated response capable of bringing the original drive back online. The user must explicitly take one of the actions described above. A future enhancement will allow ZFS to subscribe to hotplug events and automatically replace the affected drive when it is replaced on the system. D. MANPAGE DIFFS XXX
Edward Pilatowicz
2006-Apr-03 21:37 UTC
[zfs-discuss] Updated Proposal: ZFS Hot Spare support
i''ve got a few quick questions... - what happens if a spare drive that is not in use goes bad? (i assume it goes from AVAILABLE to FAULTED) - what happens if a spare drive that is in use goes bad? (is it left swapped in but marked as bad? or is it automatically removed from the pool and replaced with another spare? what if there are no more spares?) - is there any kind of periodic validation of spare drives to make sure they haven''t gone bad? ed On Sat, Apr 01, 2006 at 04:11:15PM -0800, Eric Schrock wrote:> Below is an updated version of the previous hot spare proposal. Only a > few things have been tweaked based on feedback received: > > * Clarified that spares can be applied to RAID-Z and mirrored devices > * Made the state of the ''spare'' device is always DEGRADED > * Used the terms ''AVAILABLE'' and ''INUSE'' to describe hot spares > > I hope to submit this case for review this week, so if you have any > further suggestions let me know. > > - Eric > > A. DESCRIPTION > > ZFS, as an integrated volume manager and filesystem, has the ability to > replace disks within an active pool. This allows administrators to > replace failing or faulted drives to keep the system functioning > with the required level of replication. Most other volume managers also > support the ability to perform this replacement automatically through > the use of "hot spares". This case will add this functionality to ZFS. > > This case will increment the on-disk version number in accordance to > PSARC 2006/206, as the resulting labels introduce a new pool state that > older pools will not understand, and exported pools containing hot > spares will not be importable on earlier versions. > > B. POOL MANAGEMENT > > Hot spares are stored with each pool, although they can be overlapped > between different pools. This allows administrators to reserve > system-wide hot spares, as well as per-pool hot spares according to their > policies. > > 1. Creating a pool with hot spares > > A pool can be created with hot spares by using the new ''spare'' vdev: > > # zpool create test mirror c0d0 c1d0 spare c2d0 c3d0 > > This will create a pool with a single mirror and two spares. Only a > single ''spare'' vdev can be specified, though it can appear anywhere > within the command line. The resulting pool looks like the following: > > # zpool status > pool: test > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0d0 ONLINE 0 0 0 > c1d0 ONLINE 0 0 0 > spares > c2d0 AVAILABLE > c3d0 AVAILABLE > > errors: No known data errors > > 2. Adding hot spares to a pool > > Hot spares can be added to a pool in the same manner by using ''zpool > add'': > > # zpool add test spare c4d0 c5d0 > > This will add two disks to the set of available spares in the pool. > > 3. Removing hot spares from a pool > > Hot spares can be removed from a pool with the new ''zpool remove'' > subcommand. This subcommand suggests the ability to remove arbitrary > devices, and certainly is a feature that will be supported in a future > release, but currently this will only allow removing hot spares. For > example: > > # zpool remove test c2d0 > > If the hot spare is currently spared in, then the command will print an > error and exit. > > 4. Activating a hot spare > > Hot spares can be used for replacement just like any other device using > ''zpool replace''. Both mirrored and RAID-Z devices can be replaced with > a hot spare. Even unreplicated devices can be replaced with a hot spare > through predictive failure analysis, but the usefulness of such a > configuration is questionable. If ZFS detects that the device is a hot > spare within the same pool, then it will create a ''spare'' vdev instead > of a ''replacing'' vdev: > > # zpool replace test c0d0 c2d0 > # zpool status > ... > config: > NAME STATE READ WRITE CKSUM > test DEGRADED 0 0 0 > mirror DEGRADED 0 0 0 > spare DEGRADED 0 0 0 > c0d0 FAULTED 0 0 0 > c2d0 ONLINE 0 0 0 35.5K resilvered > c1d0 ONLINE 0 0 0 > spares > c2d0 INUSE by current pool > c3d0 AVAILABLE > > The difference between a ''replacing'' and ''spare'' vdev is that the former > automatically removes the original drive once the replace completes. > With spares, the vdev remains until the original device is removed from > the system, at which point the hot spare is returned to the pool of > available spares. Note that in this example we have replaced an online > device. Under normal circumstances, the device in question would be > faulted or the administrator would have proactively offlined the device. > > The state of a ''spare'' vdev is always DEGRADED. When a spare is in use, > its status is displaed as INUSE instead of AVAILABLE. If the device is > spared in the current pool, it will display "by current pool". If the > spare is shared between multiple pools, and it is in use in another > pool, it will display "by pool ''foo''". > > 5. Deactivating a hot spare > > There are 3 ways in which a hot spare can be deactivated: cancelling the > hot spare, replacing the original drive, or permanently swapping in the > hot spare. > > To cancel a hot spare attempt, the user can simply ''zpool detach'' the > hot spare in question, at which point it will be returned to the set of > available spares, the the original drive will remain in its current > position (faulted or not): > > # zpool detach test c2d0 > # zpool status > ... > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c0d0 ONLINE 0 0 0 35.5K resilvered > c1d0 ONLINE 0 0 0 > spares > c2d0 AVAILABLE > c3d0 AVAILABLE > > If the original device is replaced, then the spare is automatically > removed once the replace completes: > > # zpool replace test c0d0 c4d0 > # zpool status > ... > config: > > NAME STATE READ WRITE CKSUM > test DEGRADED 0 0 0 > mirror DEGRADED 0 0 0 > spare DEGRADED 0 0 0 > replacing ONLINE 0 0 0 > c0d0 FAULTED 0 0 0 > c4d0 ONLINE 0 0 0 38K resilvered > c2d0 ONLINE 0 0 0 38K resilvered > c1d0 ONLINE 0 0 0 > spares > c2d0 INUSE by current pool > c3d0 AVAILABLE > <wait for replace to complete> > # zpool status > ... > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4d0 ONLINE 0 0 0 35.5K resilvered > c1d0 ONLINE 0 0 0 > spares > c2d0 AVAILABLE > c3d0 AVAILABLE > > If the user instead wants the hot spare to permanently assume the place > of the original device, the original device can be removed with ''zpool > detach''. At this point the hot spare will become a functioning device, > and automatically be removed from the list of available hot spares: > > # zpool detach test c0d0 > # zpool status > ... > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c2d0 ONLINE 0 0 0 35K resilvered > c1d0 ONLINE 0 0 0 > spares > c3d0 AVAILABLE > > 6. Determining device usage > > A hot spare is considered ''in use'' for the purpose of libdiskmgt and > zpool(1M) if it is labelled as a spare and is currently in one or more > pool''s list of active spares. If a spare is part of an exported pool, > it is not considered in use, due largely to the fact that distinguishing > this case from a recently destroyed pool is difficult and not solvable > in the general case. > > C. AUTOMATED REPLACEMENT > > In order to perform automated replacement, a ZFS FMA agent will be added > that subscribes to ''fault.zfs.vdev.*'' faults. When a fault is received, > the agent will examine the pool to see if it has any available hot > spares. If so, it will perform a ''zpool replace'' with an available > spare. The initial algorithm for this will be ''first come, first > serve'', which may not be ideal for all circumstances (such as when not > all spares are the same size). It is anticipated that these > circumstances will be rare, and that the algorithm can be improved in > the future. > > This is currently limited by the fact that the ZFS diagnosis engine only > emits faults when a device has disappeared from the system. When the DE > is enhanced to proactively fault drives based on error rates, then the > agent will automaticaly leverage this feature. > > In addition, note that there is no automated response capable of > bringing the original drive back online. The user must explicitly take > one of the actions described above. A future enhancement will > allow ZFS to subscribe to hotplug events and automatically replace the > affected drive when it is replaced on the system. > > D. MANPAGE DIFFS > > XXX > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Mon, Apr 03, 2006 at 02:37:39PM -0700, Edward Pilatowicz wrote:> i''ve got a few quick questions... > > - what happens if a spare drive that is not in use goes bad? > (i assume it goes from AVAILABLE to FAULTED)Yes, that''s correct. Although see below...> - what happens if a spare drive that is in use goes bad? > (is it left swapped in but marked as bad? or is it automatically > removed from the pool and replaced with another spare? what if there > are no more spares?)If a hot spare is available, it will be replaced automatically. If no spares are available, it will be marked as FAULTED and left in-place, indicating that an attempt was made but failed. Although maybe it should just detach it and cancel the spare... I haven''t tested all the corner cases in the current prototype, but this is certainly the desired behavior.> - is there any kind of periodic validation of spare drives to make sure > they haven''t gone bad?Yes, this is the plan. The current prototype doesn''t do this, yet. I''ll add clarifications on the above points to the proposal. Thanks for the feedback. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Apr 3, 2006, at 11:48 PM, Eric Schrock wrote:>> - is there any kind of periodic validation of spare drives to make >> sure >> they haven''t gone bad? > > Yes, this is the plan. The current prototype doesn''t do this, yet.Does zfs require that all spares are powered up and spinning all the time, i.e. can spare disks be powered off to preserve lifetime of the physical drive ??? Kaiser Jasse -- Authorized Stealth Oracle The axioms of wisdom: 1. You can''t outstubborn a cat 2. You can''t conquer the universe without the knowledge of FORTRAN 3. In the Unix realm, 10% of work fixes 90% of the problems
On Tue, Apr 04, 2006 at 06:05:22PM +0200, Jasse Jansson wrote:> > On Apr 3, 2006, at 11:48 PM, Eric Schrock wrote: > > >>- is there any kind of periodic validation of spare drives to make > >>sure > >> they haven''t gone bad? > > > >Yes, this is the plan. The current prototype doesn''t do this, yet. > > Does zfs require that all spares are powered up and spinning > all the time, i.e. can spare disks be powered off to preserve > lifetime of the physical drive ???Yes, that''s the expectation. The "periodic validation" would likely be a somewhat large interval - maybe an hour or so. It would also be nice to check that in addition just "still being there" it did a little validation - maybe writing a bunch of sectors and reading them back, so that you don''t end up in the situation where you go to swap in a spare and it turns out to be dead. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Eric Schrock wrote:> On Tue, Apr 04, 2006 at 06:05:22PM +0200, Jasse Jansson wrote: >>Does zfs require that all spares are powered up and spinning >>all the time, i.e. can spare disks be powered off to preserve >>lifetime of the physical drive ??? > > Yes, that''s the expectation. The "periodic validation" would likely be > a somewhat large interval - maybe an hour or so. It would also be nice > to check that in addition just "still being there" it did a little > validation - maybe writing a bunch of sectors and reading them back, so > that you don''t end up in the situation where you go to swap in a spare > and it turns out to be dead.This is a hard problem, because the "right thing" varies from disk to disk. At the coarsest level of granularity and generality, laptop drives are designed for frequent spin up-down cycles, while server drives are designed for being spun up for long periods of time. I''m sure there are variations among manufacturers, as well. Perhaps a probe hourly that confirms that a spun-down but powered-on drive is still there, and a less frequent check (weekly? monthly?) that would spin it up and confirm that it actually works. --Ed -- Ed Gould Sun Microsystems File System Architect Sun Cluster ed.gould at sun.com 17 Network Circle +1.650.786.4937 M/S UMPK17-201 x84937 Menlo Park, CA 94025
On Apr 4, 2006, at 6:57 PM, Ed Gould wrote:> Eric Schrock wrote: >> On Tue, Apr 04, 2006 at 06:05:22PM +0200, Jasse Jansson wrote: >>> Does zfs require that all spares are powered up and spinning >>> all the time, i.e. can spare disks be powered off to preserve >>> lifetime of the physical drive ??? >> Yes, that''s the expectation. The "periodic validation" would >> likely be >> a somewhat large interval - maybe an hour or so. It would also be >> nice >> to check that in addition just "still being there" it did a little >> validation - maybe writing a bunch of sectors and reading them >> back, so >> that you don''t end up in the situation where you go to swap in a >> spare >> and it turns out to be dead. > > This is a hard problem, because the "right thing" varies from disk > to disk. At the coarsest level of granularity and generality, > laptop drives are designed for frequent spin up-down cycles, while > server drives are designed for being spun up for long periods of > time. I''m sure there are variations among manufacturers, as well. > Perhaps a probe hourly that confirms that a spun-down but powered- > on drive is still there, and a less frequent check (weekly? > monthly?) that would spin it up and confirm that it actually works.Interesting point. But the idea behind my original replay was to add some support for cold spares, although I could have written it better. I have both a prototype board with an 8 bit risc processor and the base plan about how to use it as an USB or serial connected device for controlling those cold spares. Kaiser Jasse -- Authorized Stealth Oracle The axioms of wisdom: 1. You can''t outstubborn a cat 2. You can''t conquer the universe without the knowledge of FORTRAN 3. In the Unix realm, 10% of work fixes 90% of the problems
Eric Schrock wrote:> On Tue, Apr 04, 2006 at 06:05:22PM +0200, Jasse Jansson wrote: > >> On Apr 3, 2006, at 11:48 PM, Eric Schrock wrote: >> >> >>>> - is there any kind of periodic validation of spare drives to make >>>> sure >>>> they haven''t gone bad? >>>> >>> Yes, this is the plan. The current prototype doesn''t do this, yet. >>> >> Does zfs require that all spares are powered up and spinning >> all the time, i.e. can spare disks be powered off to preserve >> lifetime of the physical drive ??? >> > > Yes, that''s the expectation. The "periodic validation" would likely be > a somewhat large interval - maybe an hour or so. It would also be nice > to check that in addition just "still being there" it did a little > validation - maybe writing a bunch of sectors and reading them back, so > that you don''t end up in the situation where you go to swap in a spare > and it turns out to be dead. > >One thing on this train of thought... you note that the 2 states defined for hot spares are AVAILABLE and INUSE. What if a hotspare is bad? Shouldn''t you note that as well in the hotspare description? (Perhaps I missed this somewhere). thanks, sarah ****> - Eric > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >