We are implementing a ZFS storage server (NAS) to replace a NetApp box. I have a Sun server with two dual Ultra320 PCIX cards connected to 4 shelves of 12 500GB disks each, yielding a total of 24TB of raw storage. I''m kicking around the different ways to carve this space up, balancing storage space with data integrity. The layout that I have come to think is the best for me is to create a raidz2 pool for each shelf (i.e. 5TB per shelf of usable storage) and stripe across the shelves. This would let me lose up to two drives per shelf and still be operational. My only concern with this is if a shelf fails (SCSI card failure, etc) the whole system is down, however the MTBF for a SCSI card is WAAAAY higher than the MTBF of a hard drive... FWIW, we do have a backup strategy onto a SpectraLogic tape loader, so losing the whole array, while bad, won''t put us out of business, though I''d prefer it didn''t happen :) Obviously there are dozens of other ways to carve this storage up, and I''d like some recommendations on what others have done with this amount of storage space and any experiences (good or bad) that they''ve had. Thanks in advance! Eric This message posted from opensolaris.org
On Tue, Sep 19, 2006 at 02:40:14PM -0700, Eric Hill wrote:> We are implementing a ZFS storage server (NAS) to replace a NetApp > box. I have a Sun server with two dual Ultra320 PCIX cards connected > to 4 shelves of 12 500GB disks each, yielding a total of 24TB of raw > storage. > > I''m kicking around the different ways to carve this space up, > balancing storage space with data integrity. The layout that I have > come to think is the best for me is to create a raidz2 pool for each > shelf (i.e. 5TB per shelf of usable storage) and stripe across the > shelves. This would let me lose up to two drives per shelf and still > be operational. My only concern with this is if a shelf fails (SCSI > card failure, etc) the whole system is down, however the MTBF for a > SCSI card is WAAAAY higher than the MTBF of a hard drive... > > FWIW, we do have a backup strategy onto a SpectraLogic tape loader, so > losing the whole array, while bad, won''t put us out of business, > though I''d prefer it didn''t happen :) > > Obviously there are dozens of other ways to carve this storage up, and > I''d like some recommendations on what others have done with this > amount of storage space and any experiences (good or bad) that they''ve > had.If reliability is your main concern, you could do raidz2 vdevs across 8 drives (2 per shelf). This would let you lose any two drives in one shelf _or_ any one shelf failing (but not both - if you lose one drive in one shelf and another shelf fails, you''re out of luck). The downside is that your usable space drops (18 TB instead of 20TB). However, I/O performance may improve depending on your workload, because more RAID-Z vdevs means more IOPS, but the only real way to settle that argument is for you to run your own application-specific benchmarks. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On September 19, 2006 2:47:16 PM -0700 Eric Schrock <eric.schrock at sun.com> wrote:> On Tue, Sep 19, 2006 at 02:40:14PM -0700, Eric Hill wrote: >> We are implementing a ZFS storage server (NAS) to replace a NetApp >> box. I have a Sun server with two dual Ultra320 PCIX cards connected >> to 4 shelves of 12 500GB disks each, yielding a total of 24TB of raw >> storage. >> >> I''m kicking around the different ways to carve this space up, >> balancing storage space with data integrity. The layout that I have >> come to think is the best for me is to create a raidz2 pool for each >> shelf (i.e. 5TB per shelf of usable storage) and stripe across the >> shelves. This would let me lose up to two drives per shelf and still >> be operational. My only concern with this is if a shelf fails (SCSI >> card failure, etc) the whole system is down, however the MTBF for a >> SCSI card is WAAAAY higher than the MTBF of a hard drive... >> >> FWIW, we do have a backup strategy onto a SpectraLogic tape loader, so >> losing the whole array, while bad, won''t put us out of business, >> though I''d prefer it didn''t happen :) >> >> Obviously there are dozens of other ways to carve this storage up, and >> I''d like some recommendations on what others have done with this >> amount of storage space and any experiences (good or bad) that they''ve >> had. > > If reliability is your main concern, you could do raidz2 vdevs across 8 > drives (2 per shelf). This would let you lose any two drives in one > shelf _or_ any one shelf failing (but not both - if you lose one drive > in one shelf and another shelf fails, you''re out of luck). The downside > is that your usable space drops (18 TB instead of 20TB). However, I/O > performance may improve depending on your workload, because more RAID-Z > vdevs means more IOPS, but the only real way to settle that argument is > for you to run your own application-specific benchmarks.I don''t think the space lost is a problem since you don''t want to configure a raid-z with more than 9 drives anyway. (I assume raidz2 would be the same). It might be good to have a hot spare in there somewhere. Someone should make a 13 drive shelf. :-) -frank
Richard Elling - PAE
2006-Sep-19 22:06 UTC
[zfs-discuss] Disk Layout for New Storage Server
more below... Eric Schrock wrote:> On Tue, Sep 19, 2006 at 02:40:14PM -0700, Eric Hill wrote: >> We are implementing a ZFS storage server (NAS) to replace a NetApp >> box. I have a Sun server with two dual Ultra320 PCIX cards connected >> to 4 shelves of 12 500GB disks each, yielding a total of 24TB of raw >> storage. >> >> I''m kicking around the different ways to carve this space up, >> balancing storage space with data integrity. The layout that I have >> come to think is the best for me is to create a raidz2 pool for each >> shelf (i.e. 5TB per shelf of usable storage) and stripe across the >> shelves. This would let me lose up to two drives per shelf and still >> be operational. My only concern with this is if a shelf fails (SCSI >> card failure, etc) the whole system is down, however the MTBF for a >> SCSI card is WAAAAY higher than the MTBF of a hard drive... >> >> FWIW, we do have a backup strategy onto a SpectraLogic tape loader, so >> losing the whole array, while bad, won''t put us out of business, >> though I''d prefer it didn''t happen :) >> >> Obviously there are dozens of other ways to carve this storage up, and >> I''d like some recommendations on what others have done with this >> amount of storage space and any experiences (good or bad) that they''ve >> had. > > If reliability is your main concern, you could do raidz2 vdevs across 8 > drives (2 per shelf). This would let you lose any two drives in one > shelf _or_ any one shelf failing (but not both - if you lose one drive > in one shelf and another shelf fails, you''re out of luck). The downside > is that your usable space drops (18 TB instead of 20TB). However, I/O > performance may improve depending on your workload, because more RAID-Z > vdevs means more IOPS, but the only real way to settle that argument is > for you to run your own application-specific benchmarks.I like this. I would also recommend using at least one spare. Using the grow into your storage model, I''d leave one set of 8 disks as spares until the space is needed. -- richard
I really like that idea. That indeed would provide for both excellent reliability (the ability to lose an entire shelf) and performance (stripe across 6 rz2 pools). Thanks for the suggestion! This message posted from opensolaris.org