I might have this mentioned already on the list and can''t find it now, or I might have misread something and come up with this ... Right now, using hot spares is a typical method to increase storage pool resiliency, since it minimizes the time that an array is degraded. The downside is that drives assigned as hot spares are essentially wasted. They take up space & power but don''t provide usable storage. Depending on the number of spares you''ve assigned, you could have 7% of your purchased capacity idle, assuming 1 spare per 14-disk shelf. This is on top of the RAID6 / raidz[1-3] overhead. What about using the free space in the pool to cover for the failed drive? With bp rewrite, would it be possible to rebuild the vdev from parity and simultaneously rewrite those blocks to a healthy device? In other words, when there is free space, remove the failed device from the zpool, resizing (shrinking) it on the fly and restoring full parity protection for your data. If online shrinking doesn''t work, create a phantom file that accounts for all the space lost by the removal of the device until an export / import. It''s not something I''d want to do with less than raidz2 protection, and I imagine that replacing the failed device and expanding the stripe width back to the original would have some negative performance implications that would not occur otherwise. I also imagine it would take a lot longer to rebuild / resilver at both device failure and device replacement. You wouldn''t be able to share a spare among many vdevs either, but you wouldn''t always need to if you leave some space free on the zpool. Provided that bp rewrite is committed, and vdev & zpool shrinks are functional, could this work? It seems like a feature most applicable to SOHO users, but I''m sure some enterprise users could find an application for nearline storage where available space trumps performance. -B -- Brandon High : bhigh at freaks.com Always try to do things in chronological order; it''s less confusing that way.
On Wed, Sep 30, 2009 at 7:06 PM, Brandon High <bhigh at freaks.com> wrote:> I might have this mentioned already on the list and can''t find it now, > or I might have misread something and come up with this ... > > Right now, using hot spares is a typical method to increase storage > pool resiliency, since it minimizes the time that an array is > degraded. The downside is that drives assigned as hot spares are > essentially wasted. They take up space & power but don''t provide > usable storage. > > Depending on the number of spares you''ve assigned, you could have 7% > of your purchased capacity idle, assuming 1 spare per 14-disk shelf. > This is on top of the RAID6 / raidz[1-3] overhead. > > What about using the free space in the pool to cover for the failed drive? > > With bp rewrite, would it be possible to rebuild the vdev from parity > and simultaneously rewrite those blocks to a healthy device? In other > words, when there is free space, remove the failed device from the > zpool, resizing (shrinking) it on the fly and restoring full parity > protection for your data. If online shrinking doesn''t work, create a > phantom file that accounts for all the space lost by the removal of > the device until an export / import. > > It''s not something I''d want to do with less than raidz2 protection, > and I imagine that replacing the failed device and expanding the > stripe width back to the original would have some negative performance > implications that would not occur otherwise. I also imagine it would > take a lot longer to rebuild / resilver at both device failure and > device replacement. You wouldn''t be able to share a spare among many > vdevs either, but you wouldn''t always need to if you leave some space > free on the zpool. > > Provided that bp rewrite is committed, and vdev & zpool shrinks are > functional, could this work? It seems like a feature most applicable > to SOHO users, but I''m sure some enterprise users could find an > application for nearline storage where available space trumps > performance. > > -B > > -- > Brandon High : bhigh at freaks.com > Always try to do things in chronological order; it''s less confusing that > way. >What are you hoping to accomplish? You''re still going to need a drives worth of free space, and if you''re so performance strapped that one drive makes the difference, you''ve got some bigger problems on your hands. To me it sounds like complexity for complexity''s sake, and leaving yourself with a far less flexible option in the face of a drive failure. BTW, you shouldn''t need one disk per tray of 14 disks. Unless you''ve got some known bad disks/environmental issues, every 2-3 should be fine. Quite frankly, if you''re doing raid-z3, I''d feel comfortable with one per thumper. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090930/5cf6d10f/attachment.html>
Brandon High wrote:> I might have this mentioned already on the list and can''t find it now, > or I might have misread something and come up with this ... > > Right now, using hot spares is a typical method to increase storage > pool resiliency, since it minimizes the time that an array is > degraded. The downside is that drives assigned as hot spares are > essentially wasted. They take up space & power but don''t provide > usable storage. > > Depending on the number of spares you''ve assigned, you could have 7% > of your purchased capacity idle, assuming 1 spare per 14-disk shelf. > This is on top of the RAID6 / raidz[1-3] overhead. > > What about using the free space in the pool to cover for the failed drive? > > With bp rewrite, would it be possible to rebuild the vdev from parity > and simultaneously rewrite those blocks to a healthy device? In other > words, when there is free space, remove the failed device from the > zpool, resizing (shrinking) it on the fly and restoring full parity > protection for your data. If online shrinking doesn''t work, create a > phantom file that accounts for all the space lost by the removal of > the device until an export / import. > > It''s not something I''d want to do with less than raidz2 protection, > and I imagine that replacing the failed device and expanding the > stripe width back to the original would have some negative performance > implications that would not occur otherwise. I also imagine it would > take a lot longer to rebuild / resilver at both device failure and > device replacement. You wouldn''t be able to share a spare among many > vdevs either, but you wouldn''t always need to if you leave some space > free on the zpool. > > Provided that bp rewrite is committed, and vdev & zpool shrinks are > functional, could this work? It seems like a feature most applicable > to SOHO users, but I''m sure some enterprise users could find an > application for nearline storage where available space trumps > performance. > > -B > >What you describe makes no sense for single-parity vdevs, since it actually increases the likelihood for data loss. In multi-parity vdevs, even with the loss of one drive, you still have full parity protection, so why would you go for all that extra effort, since it gains you what? From a global perspective, multi-disk parity (e.g. raidz2 or raidz3) is the way to go instead of hot spares. Hot spares are useful for adding protection to a number of vdevs, not a single vdev. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Brandon, Yes, this is something that should be possible once we have bp rewrite (the ability to move blocks around). One minor downside to "hot space" would be that it couldn''t be shared among multiple pools the way that hot spares can. Also depending on the pool configuration, hot space may be impractical. For example if you are using wide RAIDZ[-N] stripes. If you have say 4 top-level RAIDZ-2 vdevs each with 10 disks in it, you would have to keep your pool at most 3/4 full to be able to take advantage of hot space. And if you wanted to tolerate any 2 disks failing, the pool could be at most 1/2 full. (Although one could imagine eventually recombining some of the remaining 18 good disks to make another RAIDZ group.) So I imagine that with this implementation at least (remove faulted top-level vdev), Hot Space would only be practical when using mirroring. That said, once we have (top-level) device removal implemented, you could implement a poor-man''s hot space with some simple scripts -- just remove the degraded top-level vdev from the pool. FYI, I am currently working on bprewrite for device removal. --matt Brandon High wrote:> I might have this mentioned already on the list and can''t find it now, > or I might have misread something and come up with this ... > > Right now, using hot spares is a typical method to increase storage > pool resiliency, since it minimizes the time that an array is > degraded. The downside is that drives assigned as hot spares are > essentially wasted. They take up space & power but don''t provide > usable storage. > > Depending on the number of spares you''ve assigned, you could have 7% > of your purchased capacity idle, assuming 1 spare per 14-disk shelf. > This is on top of the RAID6 / raidz[1-3] overhead. > > What about using the free space in the pool to cover for the failed drive? > > With bp rewrite, would it be possible to rebuild the vdev from parity > and simultaneously rewrite those blocks to a healthy device? In other > words, when there is free space, remove the failed device from the > zpool, resizing (shrinking) it on the fly and restoring full parity > protection for your data. If online shrinking doesn''t work, create a > phantom file that accounts for all the space lost by the removal of > the device until an export / import. > > It''s not something I''d want to do with less than raidz2 protection, > and I imagine that replacing the failed device and expanding the > stripe width back to the original would have some negative performance > implications that would not occur otherwise. I also imagine it would > take a lot longer to rebuild / resilver at both device failure and > device replacement. You wouldn''t be able to share a spare among many > vdevs either, but you wouldn''t always need to if you leave some space > free on the zpool. > > Provided that bp rewrite is committed, and vdev & zpool shrinks are > functional, could this work? It seems like a feature most applicable > to SOHO users, but I''m sure some enterprise users could find an > application for nearline storage where available space trumps > performance. > > -B >
Erik Trimble wrote:> From a global perspective, multi-disk parity (e.g. raidz2 or raidz3) is > the way to go instead of hot spares. > Hot spares are useful for adding protection to a number of vdevs, not a > single vdev.Even when using raidz2 or 3, it is useful to have hot spares so that reconstruction can begin immediately. Otherwise it would have to wait for the operator to physically remove the failed disk and insert a new one. --matt
On Sep 30, 2009, at 6:03 PM, Matthew Ahrens wrote:> Erik Trimble wrote: >> From a global perspective, multi-disk parity (e.g. raidz2 or >> raidz3) is the way to go instead of hot spares. >> Hot spares are useful for adding protection to a number of vdevs, >> not a single vdev. > > Even when using raidz2 or 3, it is useful to have hot spares so that > reconstruction can begin immediately. Otherwise it would have to > wait for the operator to physically remove the failed disk and > insert a new one.When I model these things, I use 8 hours logistical response time for data centers and 48 hours for SOHO. When the disks were small, and thus resilver times were short, the logistical response time could make a big impact. With 2+ TB drives, the resilver time is becoming dominant. As disks becoming larger and not faster, there will be a day when the logistical response time will become insignificant. In other words, you won''t need a spare to improve logistical response, but you can consider using spares to extend logistical response time to months. To take this argument to its limit, it is possible that in our lifetime RAID boxes will be disposable... the razor industry will be proud of us ;-) -- richard
> Yes, this is something that should be possible once we have bp rewrite > (the > ability to move blocks around).[snip]> FYI, I am currently working on bprewrite for device removal. > > --mattThat''s very cool. I don''t code (much/enough to help), but I''d like to help if I can. If nothing else, my wife makes a mean chocolate chip cookie! Think a batch of those would help? Paul Archer
On Wed, 30 Sep 2009, Richard Elling wrote:> a big impact. With 2+ TB drives, the resilver time is becoming dominant. > As disks becoming larger and not faster, there will be a day when the > logistical response time will become insignificant. In other words, you > won''t need a spare to improve logistical response, but you can consider > using spares to extend logistical response time to months. To take this > argument to its limit, it is possible that in our lifetime RAID boxes will > be disposable... the razor industry will be proud of us ;-)Unless there is a dramatic increase in disk bandwidth, there is a point where disk storage size becomes unmanageable. This is the point where we should transition from 3-1/2" disk to 2-1/2" disks with smaller storage sizes. I see that 2-1/2" disks are already up to 500GB. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Replying to a few folks in a digest format, because I''m lazy and don''t have that much to say. On Wed, Sep 30, 2009 at 5:53 PM, Tim Cook <tim at cook.ms> wrote:> What are you hoping to accomplish?? You''re still going to need a drives > worth of free space, and if you''re so performance strapped that one drive > makes the difference, you''ve got some bigger problems on your hands.As I mentioned, the biggest win would be in a SOHO environment where you may have bought more space than you need right now, and in the meantime can use it for a wider stripe. Don''t think about it as a high performance filesystem, think about it in the context of a Drobo-like device. It''ll provide you with protection from 2 drives failing (since it would require double parity), or even better if there''s free space available.> BTW, you shouldn''t need one disk per tray of 14 disks.? Unless you''ve gotWe use one spare per shelf on our current NetApp hardware. No real reason other than it makes the provisioning more consistent. Given the large number of filers that we have, consistency is important. On Wed, Sep 30, 2009 at 5:56 PM, Erik Trimble <Erik.Trimble at sun.com> wrote:> What you describe makes no sense for single-parity vdevs, since it actuallyI''m pretty sure I said that I wouldn''t recommend it for anything less than raidz2. As far as gains? It would get you out of degraded mode, which can help performance. This may not be important though, since I believe raidz2 with a single faulted device doesn''t have much of an impact. On Wed, Sep 30, 2009 at 6:01 PM, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> ability to move blocks around). One minor downside to "hot space" would be > that it couldn''t be shared among multiple pools the way that hot spares can.Why not? If you have the space available in the zpool, you should be able to move the data to other vdevs and shrink the degraded one. Unless bprewrite doesn''t allow data to move between vdevs, that is. -B -- Brandon High : bhigh at freaks.com