Hi, This is not exactly ZFS specific, but this still seems like a fruitful place to ask. It occurred to me today that hot spares could sit in standby (spun down) until needed (I know ATA can do this, I''m supposing SCSI does too, but I haven''t looked at a spec recently). Does anybody do this? Or does everybody do this already? Does the tub curve (chance of early life failure) imply that hot spares should be burned in, instead of sitting there doing nothing from new? Just like a data disk, seems to me you''d want to know if a hot spare fails while waiting to be swapped in. Do they get tested periodically? --Toby
You could easily do this in Solaris today by just using power.conf(4). Just have it spin down any drives that have been idle for a day or more. The periodic testing part would be an interesting project to kick off. --Bill On Mon, Jan 29, 2007 at 08:21:16PM -0200, Toby Thain wrote:> Hi, > > This is not exactly ZFS specific, but this still seems like a > fruitful place to ask. > > It occurred to me today that hot spares could sit in standby (spun > down) until needed (I know ATA can do this, I''m supposing SCSI does > too, but I haven''t looked at a spec recently). Does anybody do this? > Or does everybody do this already? > > Does the tub curve (chance of early life failure) imply that hot > spares should be burned in, instead of sitting there doing nothing > from new? Just like a data disk, seems to me you''d want to know if a > hot spare fails while waiting to be swapped in. Do they get tested > periodically? > > --Toby > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Toby Thain wrote:> Hi, > > This is not exactly ZFS specific, but this still seems like a fruitful > place to ask. > > It occurred to me today that hot spares could sit in standby (spun down) > until needed (I know ATA can do this, I''m supposing SCSI does too, but I > haven''t looked at a spec recently). Does anybody do this? Or does > everybody do this already?"luxadm stop" will work for many SCSI and FC JBODs. If your drive doesn''t support it, it won''t hurt anything, it will just claim "Unsupported" -- not very user friendly, IMHO. I think it is a good idea, with one potential gotcha. The gotcha is that it can take 30 seconds or more to spin up. By default, the sd and ssd timeouts are such that a pending iop will not notice that it took a while to spin up. However, if you have changed those defaults, as sometimes occurs in high availability requirements, then you probably shouldn''t do this.> Does the tub curve (chance of early life failure) imply that hot spares > should be burned in, instead of sitting there doing nothing from new?Good question. If you consider that mechanical wear out is what ultimately causes many failure modes, then the argument can be made that a spun down disk should last longer. The problem is that there are failure modes which are triggered by a spin up. I''ve never seen field data showing the difference between the two. I spin mine down because they are too loud and consume more electricity, and electricity is expensive in Southern California.> Just like a data disk, seems to me you''d want to know if a hot spare > fails while waiting to be swapped in. Do they get tested periodically?Another good question. AFAIK, they are not accessed until needed. Note: they will be queried on boot which will cause a spin up. I use a cron job to spin mine down in the late evening. -- richard
On Mon, 29 Jan 2007, Toby Thain wrote:> Hi, > > This is not exactly ZFS specific, but this still seems like a > fruitful place to ask. > > It occurred to me today that hot spares could sit in standby (spun > down) until needed (I know ATA can do this, I''m supposing SCSI does > too, but I haven''t looked at a spec recently). Does anybody do this? > Or does everybody do this already?I don''t work with enough disk storage systems to know what is the industry norm. But there are 3 broad categories of disk drive spares: a) Cold Spare. A spare where the power is not connected until it is required. [1] b) Warm Spare. A spare that is active but placed into a low power mode. Or into a "low mechanical ware & tare" mode. In the case of a disk drive, the controller board is active but the HDA (Head Disk Assembly) is inactive (platters are stationary, heads unloaded [if the heads are physically unloaded]); it has power applied and can be made "hot" by a command over its data/command (bus) connection. The supervisorary hardware/software/firmware "knows" how long it *should* take the drive to go from warm to hot. c) Hot Spare. A spare that is spun up and ready to accept read/write/position (etc) requests.> Does the tub curve (chance of early life failure) imply that hot > spares should be burned in, instead of sitting there doing nothing > from new? Just like a data disk, seems to me you''d want to know if a > hot spare fails while waiting to be swapped in. Do they get tested > periodically?The ideal scenario, as you already allude to, would be for the disk subsystem to initially configure the drive as a hot spare and send it periodic "test" events for, say, the first 48 hours. This would get it past the first segment of the "bathtub" reliability curve - often referred to as the "infant mortality" phase. After that, (ideally) it would be placed into "warm standby" mode and it would be periodically tested (once a month??). If saving power was the highest priority, then the ideal situation would be where the disk subsystem could apply/remove power to the spare and move it from warm to cold upon command. One "trick" with disk subsystems, like ZFS that have yet to have the FMA type functionality added and which (today) provide for hot spares only, is to initially configure a pool with one (hot) spare, and then add a 2nd hot spare, based on installing a brand new device, say, 12 months later. And another spare 12 months later. What you are trying to achieve, with this strategy, is to avoid the scenario whereby mechanical systems, like disk drives, tend to "wear out" within the same general, relatively short, timeframe. One (obvious) issue with this strategy, is that it may be impossible to purchase the same disk drive 12 and 24 months later. However, it''s always possible to purchase a larger disk drive and simply commit to the fact that the extra space provided by the newer drive will be wasted. [1] The most common example is a disk drive mounted on a carrier but not seated within the disk drive enclosure. Simple "push in" when required. Off Topic: To go off on a tangent - the same strategy applies to a UPS (Uninterruptable Power Supply). As per the following time line: year 0: purchase the UPS and one battery cabinet year 1: purchase and attach an additional battery cabinet year 2: purchase and attach an additional battery cabinet year 3: purchase and attach an additional battery cabinet year 4: purchase and attach an additional battery cabinet and remove the oldest battery cabinet year 5 ... N: repeat year 4s scenario until its time to replace the UPS. The advantage of this scenario is that you can budget a *fixed* cost for the UPS and your management understands that there is a recurring cost so that, when the power fails, your UPS will have working batteries!! Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006
On 29-Jan-07, at 9:04 PM, Al Hopper wrote:> On Mon, 29 Jan 2007, Toby Thain wrote: > >> Hi, >> >> This is not exactly ZFS specific, but this still seems like a >> fruitful place to ask. >> >> It occurred to me today that hot spares could sit in standby (spun >> down) until needed (I know ATA can do this, I''m supposing SCSI does >> too, but I haven''t looked at a spec recently). Does anybody do this? >> Or does everybody do this already? > > I don''t work with enough disk storage systems to know what is the > industry > norm. But there are 3 broad categories of disk drive spares: > > a) Cold Spare. A spare where the power is not connected until it is > required. [1] > > b) Warm Spare. A spare that is active but placed into a low power > mode. ... > > c) Hot Spare. A spare that is spun up and ready to accept > read/write/position (etc) requests.Hi Al, Thanks for reminding me of the distinction. It seems very few installations would actually require (c)?> >> Does the tub curve (chance of early life failure) imply that hot >> spares should be burned in, instead of sitting there doing nothing >> from new? Just like a data disk, seems to me you''d want to know if a >> hot spare fails while waiting to be swapped in. Do they get tested >> periodically? > > The ideal scenario, as you already allude to, would be for the disk > subsystem to initially configure the drive as a hot spare and send it > periodic "test" events for, say, the first 48 hours.For some reason that''s a little shorter than I had in mind, but I take your word that that''s enough burn-in for semiconductors, motors, servos, etc.> This would get it > past the first segment of the "bathtub" reliability curve ... > > If saving power was the highest priority, then the ideal situation > would > be where the disk subsystem could apply/remove power to the spare > and move > it from warm to cold upon command.I am surmising that it would also considerably increase the spare''s useful lifespan versus "hot" and spinning.> > One "trick" with disk subsystems, like ZFS that have yet to have > the FMA > type functionality added and which (today) provide for hot spares > only, is > to initially configure a pool with one (hot) spare, and then add a > 2nd hot > spare, based on installing a brand new device, say, 12 months > later. And > another spare 12 months later. What you are trying to achieve, > with this > strategy, is to avoid the scenario whereby mechanical systems, like > disk > drives, tend to "wear out" within the same general, relatively short, > timeframe. > > One (obvious) issue with this strategy, is that it may be > impossible to > purchase the same disk drive 12 and 24 months later. However, it''s > always > possible to purchase a larger disk drive...which is not guaranteed to be compatible with your storage subsystem...! --Toby> and simply commit to the fact > that the extra space provided by the newer drive will be wasted. > > [1] The most common example is a disk drive mounted on a carrier > but not > seated within the disk drive enclosure. Simple "push in" when > required. > ... > Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 > OpenSolaris Governing Board (OGB) Member - Feb 2006
Hi Guys, I seem to remember the Massive Array of Independent Disk guys ran into a problem I think they called static friction, where idle drives would fail on spin up after being idle for a long time: http://www.eweek.com/article2/0,1895,1941205,00.asp Would that apply here? Best Regards, Jason On 1/29/07, Toby Thain <toby at smartgames.ca> wrote:> > On 29-Jan-07, at 9:04 PM, Al Hopper wrote: > > > On Mon, 29 Jan 2007, Toby Thain wrote: > > > >> Hi, > >> > >> This is not exactly ZFS specific, but this still seems like a > >> fruitful place to ask. > >> > >> It occurred to me today that hot spares could sit in standby (spun > >> down) until needed (I know ATA can do this, I''m supposing SCSI does > >> too, but I haven''t looked at a spec recently). Does anybody do this? > >> Or does everybody do this already? > > > > I don''t work with enough disk storage systems to know what is the > > industry > > norm. But there are 3 broad categories of disk drive spares: > > > > a) Cold Spare. A spare where the power is not connected until it is > > required. [1] > > > > b) Warm Spare. A spare that is active but placed into a low power > > mode. ... > > > > c) Hot Spare. A spare that is spun up and ready to accept > > read/write/position (etc) requests. > > Hi Al, > > Thanks for reminding me of the distinction. It seems very few > installations would actually require (c)? > > > > >> Does the tub curve (chance of early life failure) imply that hot > >> spares should be burned in, instead of sitting there doing nothing > >> from new? Just like a data disk, seems to me you''d want to know if a > >> hot spare fails while waiting to be swapped in. Do they get tested > >> periodically? > > > > The ideal scenario, as you already allude to, would be for the disk > > subsystem to initially configure the drive as a hot spare and send it > > periodic "test" events for, say, the first 48 hours. > > For some reason that''s a little shorter than I had in mind, but I > take your word that that''s enough burn-in for semiconductors, motors, > servos, etc. > > > This would get it > > past the first segment of the "bathtub" reliability curve ... > > > > If saving power was the highest priority, then the ideal situation > > would > > be where the disk subsystem could apply/remove power to the spare > > and move > > it from warm to cold upon command. > > I am surmising that it would also considerably increase the spare''s > useful lifespan versus "hot" and spinning. > > > > > One "trick" with disk subsystems, like ZFS that have yet to have > > the FMA > > type functionality added and which (today) provide for hot spares > > only, is > > to initially configure a pool with one (hot) spare, and then add a > > 2nd hot > > spare, based on installing a brand new device, say, 12 months > > later. And > > another spare 12 months later. What you are trying to achieve, > > with this > > strategy, is to avoid the scenario whereby mechanical systems, like > > disk > > drives, tend to "wear out" within the same general, relatively short, > > timeframe. > > > > One (obvious) issue with this strategy, is that it may be > > impossible to > > purchase the same disk drive 12 and 24 months later. However, it''s > > always > > possible to purchase a larger disk drive > > ...which is not guaranteed to be compatible with your storage > subsystem...! > > --Toby > > > and simply commit to the fact > > that the extra space provided by the newer drive will be wasted. > > > > [1] The most common example is a disk drive mounted on a carrier > > but not > > seated within the disk drive enclosure. Simple "push in" when > > required. > > ... > > Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com > > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 > > OpenSolaris Governing Board (OGB) Member - Feb 2006 > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote:> Hi Guys, > > I seem to remember the Massive Array of Independent Disk guys ran into > a problem I think they called static friction, where idle drives would > fail on spin up after being idle for a long time:You''d think that probably wouldn''t happen to a spare drive that was spun up from time to time. In fact this problem would be (mitigated and/or) caught by the periodic health check I suggested. --T> http://www.eweek.com/article2/0,1895,1941205,00.asp > > Would that apply here? > > Best Regards, > Jason > > On 1/29/07, Toby Thain <toby at smartgames.ca> wrote: >> >> On 29-Jan-07, at 9:04 PM, Al Hopper wrote: >> >> > On Mon, 29 Jan 2007, Toby Thain wrote: >> > >> >> Hi, >> >> >> >> This is not exactly ZFS specific, but this still seems like a >> >> fruitful place to ask. >> >> >> >> It occurred to me today that hot spares could sit in standby (spun >> >> down) until needed (I know ATA can do this, I''m supposing SCSI >> does >> >> too, but I haven''t looked at a spec recently). Does anybody do >> this? >> >> Or does everybody do this already? >> > >> > I don''t work with enough disk storage systems to know what is the >> > industry >> > norm. But there are 3 broad categories of disk drive spares: >> > >> > a) Cold Spare. A spare where the power is not connected until >> it is >> > required. [1] >> > >> > b) Warm Spare. A spare that is active but placed into a low power >> > mode. ... >> > >> > c) Hot Spare. A spare that is spun up and ready to accept >> > read/write/position (etc) requests. >> >> Hi Al, >> >> Thanks for reminding me of the distinction. It seems very few >> installations would actually require (c)? >> >> > >> >> Does the tub curve (chance of early life failure) imply that hot >> >> spares should be burned in, instead of sitting there doing nothing >> >> from new? Just like a data disk, seems to me you''d want to know >> if a >> >> hot spare fails while waiting to be swapped in. Do they get tested >> >> periodically? >> > >> > The ideal scenario, as you already allude to, would be for the disk >> > subsystem to initially configure the drive as a hot spare and >> send it >> > periodic "test" events for, say, the first 48 hours. >> >> For some reason that''s a little shorter than I had in mind, but I >> take your word that that''s enough burn-in for semiconductors, motors, >> servos, etc. >> >> > This would get it >> > past the first segment of the "bathtub" reliability curve ... >> > >> > If saving power was the highest priority, then the ideal situation >> > would >> > be where the disk subsystem could apply/remove power to the spare >> > and move >> > it from warm to cold upon command. >> >> I am surmising that it would also considerably increase the spare''s >> useful lifespan versus "hot" and spinning. >> >> > >> > One "trick" with disk subsystems, like ZFS that have yet to have >> > the FMA >> > type functionality added and which (today) provide for hot spares >> > only, is >> > to initially configure a pool with one (hot) spare, and then add a >> > 2nd hot >> > spare, based on installing a brand new device, say, 12 months >> > later. And >> > another spare 12 months later. What you are trying to achieve, >> > with this >> > strategy, is to avoid the scenario whereby mechanical systems, like >> > disk >> > drives, tend to "wear out" within the same general, relatively >> short, >> > timeframe. >> > >> > One (obvious) issue with this strategy, is that it may be >> > impossible to >> > purchase the same disk drive 12 and 24 months later. However, it''s >> > always >> > possible to purchase a larger disk drive >> >> ...which is not guaranteed to be compatible with your storage >> subsystem...! >> >> --Toby >> >> > and simply commit to the fact >> > that the extra space provided by the newer drive will be wasted. >> > >> > [1] The most common example is a disk drive mounted on a carrier >> > but not >> > seated within the disk drive enclosure. Simple "push in" when >> > required. >> > ... >> > Al Hopper Logical Approach Inc, Plano, TX. al at logical- >> approach.com >> > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT >> > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 >> > OpenSolaris Governing Board (OGB) Member - Feb 2006 >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>
On Jan 29, 2007, at 20:27, Toby Thain wrote:> On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote: > >> I seem to remember the Massive Array of Independent Disk guys ran >> into >> a problem I think they called static friction, where idle drives >> would >> fail on spin up after being idle for a long time: > > You''d think that probably wouldn''t happen to a spare drive that was > spun up from time to time. In fact this problem would be (mitigated > and/or) caught by the periodic health check I suggested.What about a rotating spare? When setting up a pool a lot of people would (say) balance things around buses and controllers to minimize single points of failure, and a rotating spare could disrupt this organization, but would it be useful at all?
On 1/30/07, David Magda <dmagda at ee.ryerson.ca> wrote:> What about a rotating spare? > > When setting up a pool a lot of people would (say) balance things > around buses and controllers to minimize single points of failure, > and a rotating spare could disrupt this organization, but would it be > useful at all?The costs involved in "rotating" spares in terms of IOPS reduction may not be worth it. -- Just me, Wire ...
Random thoughts: If we were to use some intelligence in the design, we could perhaps have a monitor that profiles the workload on the system (a pool, for example) over a [week|month|whatever] and selects a point in time, based on history, that it would expect the disks to be quite, and can ''pre-build'' the spare with the contents of the disk it''s about to swap out. At the point of switch-over, it could be pretty much instantaneous... It could also bail if it happened that the system actually started to get genuinely busy... That might actually be quite cool, though, if all disks are rotated, we end up with a whole bunch of disks that are evenly worn out again, which is just what we are really trying to avoid! ;) Nathan. Wee Yeh Tan wrote:> On 1/30/07, David Magda <dmagda at ee.ryerson.ca> wrote: >> What about a rotating spare? >> >> When setting up a pool a lot of people would (say) balance things >> around buses and controllers to minimize single points of failure, >> and a rotating spare could disrupt this organization, but would it be >> useful at all? > > The costs involved in "rotating" spares in terms of IOPS reduction may > not be worth it. > >
Hi Toby, You''re right. The healthcheck would definitely find any issues. I misinterpreted your comment to that effect as a question and didn''t quite latch on. A zpool MAID-mode with that healthcheck might also be interesting on something like a Thumper for pure-archival, D2D backup work. Would dramatically cut down on the power. What do y''all think? Best Regards, Jason On 1/29/07, Toby Thain <toby at smartgames.ca> wrote:> > On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote: > > > Hi Guys, > > > > I seem to remember the Massive Array of Independent Disk guys ran into > > a problem I think they called static friction, where idle drives would > > fail on spin up after being idle for a long time: > > You''d think that probably wouldn''t happen to a spare drive that was > spun up from time to time. In fact this problem would be (mitigated > and/or) caught by the periodic health check I suggested. > > --T > > > http://www.eweek.com/article2/0,1895,1941205,00.asp > > > > Would that apply here? > > > > Best Regards, > > Jason > > > > On 1/29/07, Toby Thain <toby at smartgames.ca> wrote: > >> > >> On 29-Jan-07, at 9:04 PM, Al Hopper wrote: > >> > >> > On Mon, 29 Jan 2007, Toby Thain wrote: > >> > > >> >> Hi, > >> >> > >> >> This is not exactly ZFS specific, but this still seems like a > >> >> fruitful place to ask. > >> >> > >> >> It occurred to me today that hot spares could sit in standby (spun > >> >> down) until needed (I know ATA can do this, I''m supposing SCSI > >> does > >> >> too, but I haven''t looked at a spec recently). Does anybody do > >> this? > >> >> Or does everybody do this already? > >> > > >> > I don''t work with enough disk storage systems to know what is the > >> > industry > >> > norm. But there are 3 broad categories of disk drive spares: > >> > > >> > a) Cold Spare. A spare where the power is not connected until > >> it is > >> > required. [1] > >> > > >> > b) Warm Spare. A spare that is active but placed into a low power > >> > mode. ... > >> > > >> > c) Hot Spare. A spare that is spun up and ready to accept > >> > read/write/position (etc) requests. > >> > >> Hi Al, > >> > >> Thanks for reminding me of the distinction. It seems very few > >> installations would actually require (c)? > >> > >> > > >> >> Does the tub curve (chance of early life failure) imply that hot > >> >> spares should be burned in, instead of sitting there doing nothing > >> >> from new? Just like a data disk, seems to me you''d want to know > >> if a > >> >> hot spare fails while waiting to be swapped in. Do they get tested > >> >> periodically? > >> > > >> > The ideal scenario, as you already allude to, would be for the disk > >> > subsystem to initially configure the drive as a hot spare and > >> send it > >> > periodic "test" events for, say, the first 48 hours. > >> > >> For some reason that''s a little shorter than I had in mind, but I > >> take your word that that''s enough burn-in for semiconductors, motors, > >> servos, etc. > >> > >> > This would get it > >> > past the first segment of the "bathtub" reliability curve ... > >> > > >> > If saving power was the highest priority, then the ideal situation > >> > would > >> > be where the disk subsystem could apply/remove power to the spare > >> > and move > >> > it from warm to cold upon command. > >> > >> I am surmising that it would also considerably increase the spare''s > >> useful lifespan versus "hot" and spinning. > >> > >> > > >> > One "trick" with disk subsystems, like ZFS that have yet to have > >> > the FMA > >> > type functionality added and which (today) provide for hot spares > >> > only, is > >> > to initially configure a pool with one (hot) spare, and then add a > >> > 2nd hot > >> > spare, based on installing a brand new device, say, 12 months > >> > later. And > >> > another spare 12 months later. What you are trying to achieve, > >> > with this > >> > strategy, is to avoid the scenario whereby mechanical systems, like > >> > disk > >> > drives, tend to "wear out" within the same general, relatively > >> short, > >> > timeframe. > >> > > >> > One (obvious) issue with this strategy, is that it may be > >> > impossible to > >> > purchase the same disk drive 12 and 24 months later. However, it''s > >> > always > >> > possible to purchase a larger disk drive > >> > >> ...which is not guaranteed to be compatible with your storage > >> subsystem...! > >> > >> --Toby > >> > >> > and simply commit to the fact > >> > that the extra space provided by the newer drive will be wasted. > >> > > >> > [1] The most common example is a disk drive mounted on a carrier > >> > but not > >> > seated within the disk drive enclosure. Simple "push in" when > >> > required. > >> > ... > >> > Al Hopper Logical Approach Inc, Plano, TX. al at logical- > >> approach.com > >> > Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT > >> > OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 > >> > OpenSolaris Governing Board (OGB) Member - Feb 2006 > >> > >> _______________________________________________ > >> zfs-discuss mailing list > >> zfs-discuss at opensolaris.org > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >> > >
David Magda wrote:> What about a rotating spare? > > When setting up a pool a lot of people would (say) balance things > around buses and controllers to minimize single points of failure, > and a rotating spare could disrupt this organization, but would it be > useful at all?Functionally, that sounds a lot like raidz2! "Hey, I can take a double-drive failure now! And I don''t even need to rebuild! Just like having a hot spare with raid5, but without the rebuild time!" Though I can see a "raidz sub N" being useful -- "just tell ZFS how many parity drives you want, and we''ll take care of the rest." -Luke -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3271 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070130/1c628cf2/attachment.bin>
On Mon, Jan 29, 2007 at 09:37:57PM -0500, David Magda wrote:> On Jan 29, 2007, at 20:27, Toby Thain wrote: > > >On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote: > > > >>I seem to remember the Massive Array of Independent Disk guys ran > >>into > >>a problem I think they called static friction, where idle drives > >>would > >>fail on spin up after being idle for a long time: > > > >You''d think that probably wouldn''t happen to a spare drive that was > >spun up from time to time. In fact this problem would be (mitigated > >and/or) caught by the periodic health check I suggested. > > What about a rotating spare? > > When setting up a pool a lot of people would (say) balance things > around buses and controllers to minimize single points of failure, > and a rotating spare could disrupt this organization, but would it be > useful at all?Agami Systems has the concept of "Enterprise Sparing", where the hot spare is distributed amongst data drives in the array. When a failure occurs, the rebuild occurs in parallel across _all_ drives in the array: http://www.issidata.com/specs/agami/enterprise-classreliability.pdf -- albert chin (china at thewrittenword.com)
On Jan 30, 2007, at 09:52, Luke Scharf wrote:> "Hey, I can take a double-drive failure now! And I don''t even need > to rebuild! Just like having a hot spare with raid5, but without > the rebuild time!"Theoretically you want to rebuild as soon as possible, because running in degraded mode (even with dual-parity) increases your chances of data loss (even though the probabilities involved may seem remote). Case in point, recently at work we had a drive fail in a server with 5 +1 RAID5 configuration. We replaced it, and about 2-3 weeks later a separate drive failed. Even with dual-parity, if we hadn''t replaced / rebuilt things we would now be cutting it close. I understand all the math involved with RAID 5/6 and failure rates, but its wise to remember that even if the probabilities are small they aren''t zero. :)
>I understand all the math involved with RAID 5/6 and failure rates, >but its wise to remember that even if the probabilities are small >they aren''t zero. :)And after 3-5 years of continuous operation, you better decommission the whole thing or you will have many disk failures. Casper
David Magda wrote:> On Jan 30, 2007, at 09:52, Luke Scharf wrote: > >> "Hey, I can take a double-drive failure now! And I don''t even need >> to rebuild! Just like having a hot spare with raid5, but without the >> rebuild time!" > > Theoretically you want to rebuild as soon as possible, because running > in degraded mode (even with dual-parity) increases your chances of > data loss (even though the probabilities involved may seem remote). > Case in point, recently at work we had a drive fail in a server with > 5+1 RAID5 configuration. We replaced it, and about 2-3 weeks later a > separate drive failed. Even with dual-parity, if we hadn''t replaced / > rebuilt things we would now be cutting it close.I did misspeak -- with raidz2, I still do have to replace the failed drive ASAP! However, with raidz2, you don''t have to wait hours for the rebuild to occur before the second drive can fail; with a hot-spare, the first and second failures (provided that the failures occur on the array-drives, rather than on the spare) must happen several hours apart. With raidz2 on the same hardware, the two failures can happen at the same time -- and the array can still be rebuilt. But, I guess the utility of the hot-spare depends a lot on the number of drives available, and on the layout. In my case, most of the hardware that I have is Apple XRaid units and, when using the hardware RAID inside the unit, the hot-spare must be in the same half of the box as failed drive -- in these small, constrained RAIDs, raidz2 would be much better than raidz and a spare because of the rebuild-time. With Thumper+ZFS or something like that, though, the spare could be anywhere, and I think I''d like having a few hot/warm spares on the machine that could be zinged into service if an array member fails. -Luke -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3271 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070131/818969a1/attachment.bin>
On Wed, 31 Jan 2007 Casper.Dik at Sun.COM wrote:> > >I understand all the math involved with RAID 5/6 and failure rates, > >but its wise to remember that even if the probabilities are small > >they aren''t zero. :)Agreed. Another thing I''ve seen, is that if you have an A/C (Air Conditioning) "event" in the data center or lab, you will usually see a cluster of failures over the next 2 to 3 weeks. Effectively, all your disk drives have been thermally stressed and are likely to exhibit a spike in the failure rates in the near term. Often, in a larger environment, the facilities personnel don''t understand the co-relation between an A/C event and disk drive failure rates. And major A/C upgrade work is often scheduled over a (long) weekend when most of the technical talent won''t be present. After the work is completed everyone is told that it "went very well" because the organization does not "do bad news" and then you loose two drives in a RAID5 array ....> And after 3-5 years of continuous operation, you better decommission the > whole thing or you will have many disk failures.Agreed. We took an 11 disk FC hardware RAID box offline recently because all the drives were 5 years old. It''s tough to hit those power off switches and scrap working disk drives, but much better than the business disruption and professional embarassment caused by data loss. And much better to be in control of, and experience, *scheduled* downtime. BTW: don''t forget that if you plan to continue to use the disk enclosure hardware you need to replace _all_ the fans first. Regards, Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006
Richard Elling wrote:> > Good question. If you consider that mechanical wear out is what > ultimately > causes many failure modes, then the argument can be made that a spun down > disk should last longer. The problem is that there are failure modes > which > are triggered by a spin up. I''ve never seen field data showing the > difference > between the two.Often, the spare is up and running but for whatever reason you''ll have a bad block on it and you''ll die during the reconstruct. Periodically checking the spare means reading and writing from over time in order to make sure it''s still ok. (You take the spare out of the trunk, you look at it, you check the tire pressure, etc.) The issue I see coming down the road is that we''ll start getting into a "Golden Gate paint job" where it takes so long to check the spare that we''ll just keep the process going constantly. Not as much wear and tear as real i/o but it will still be up and running the entire time and you won''t be able to spin the spare down.
> Often, the spare is up and running but for whatever reason you''ll have a > bad block on it and you''ll die during the reconstruct.Shouldn''t SCSI/ATA block sparing handle this? Reconstruction should be purely a matter of writing, so "bit rot" shouldn''t be an issue; or are there cases I''m not thinking of? (Yes, I know there are a limited number of spare blocks, but I wouldn''t expect a spare which is turned off to develop severe media problems...am I wrong?) This message posted from opensolaris.org
Torrey McMahon wrote:> Richard Elling wrote: >> Good question. If you consider that mechanical wear out is what ultimately >> causes many failure modes, then the argument can be made that a spun down >> disk should last longer. The problem is that there are failure modes which >> are triggered by a spin up. I''ve never seen field data showing the difference >> between the two. > > Often, the spare is up and running but for whatever reason you''ll have a > bad block on it and you''ll die during the reconstruct. Periodically > checking the spare means reading and writing from over time in order to > make sure it''s still ok. (You take the spare out of the trunk, you look > at it, you check the tire pressure, etc.) The issue I see coming down > the road is that we''ll start getting into a "Golden Gate paint job" > where it takes so long to check the spare that we''ll just keep the > process going constantly. Not as much wear and tear as real i/o but it > will still be up and running the entire time and you won''t be able to > spin the spare down.In my experience, checking the spare tire leads to getting a flat and needing the spare about a week later :-) It has happened to me twice in the past few years... I suspect a conspiracy... :-) Back to the topic, I''d believe that some combination of hot, warm, and cold spares would be optimal. Anton B. Rang wrote: > Shouldn''t SCSI/ATA block sparing handle this? Reconstruction should be > purely a matter of writing, so "bit rot" shouldn''t be an issue; or are > there cases I''m not thinking of? (Yes, I know there are a limited number of > spare blocks, but I wouldn''t expect a spare which is turned off to develop > severe media problems...am I wrong?) In the disk, at the disk block level, there is fairly substantial ECC. Yet, we still see data loss. There are many mechanisms at work here. One that we have studied to some detail is superparamagnetic decay -- the medium wishes to decay to a lower-enegy state, losing information in the process. One way to "prevent" this is to rewrite the data -- basically resetting the decay clock. The study we did on this says that rewriting your data once per year is reasonable. Note that ZFS is COW, and scrubbing is currently a read operation which will only write when data needs to be reconstructed. I look at this as: rewrite-style scrubbing is preventative, read and verify style scrubbing is prescriptive. Either is better than neither. In short, use spares and scrub. -- richard
Richard Elling wrote:> In the disk, at the disk block level, there is fairly substantial ECC. > Yet, we still see data loss. There are many mechanisms at work here. One > that we have studied to some detail is superparamagnetic decay -- the > medium wishes to decay to a lower-enegy state, losing information in the process. > One way to "prevent" this is to rewrite the data -- basically resetting the > decay clock. The study we did on this says that rewriting your data once > per year is reasonable. Note that ZFS is COW, and scrubbing is currently a > read operation which will only write when data needs to be reconstructed.> I look at this as: rewrite-style scrubbing is preventative, read and verify > style scrubbing is prescriptive. Either is better than neither.> In short, use spares and scrub.I see another purpose for rewrite-style scrubbing - it would be an enabler for disk eviction. First you mark the disk you want to evict as read-only, than start a rewrite scrub. When done, your disk is free of data and can be taken out. -- Henk Langeveld <henk at hlangeveld.nl>