I have a setup with a T2000 SAN attached to 90 500GB SATA drives presented as individual luns to the host. We will be sending mostly large streaming writes to the filesystems over the network (~2GB/file) in 5/6 streams per filesystem. Data protection is pretty important, but we need to have at most 25% overhead for redundancy. Some options I''m considering are: 10 x 7+2 RAIDZ2 w/ no hotspares 7 x 10+2 RAIDZ2 w/ 6 spares Does any one have advice relating to the performance or reliability to either of these? We typically would swap out a bad drive in 4-6 hrs and we expect the drives to be fairly full most of the time ~70-75% fs utilization. Thanks in advance for any input. -Andy
Robert Milkowski
2007-Mar-09 23:08 UTC
[zfs-discuss] Layout for multiple large streaming writes.
Hello Carisdad, Friday, March 9, 2007, 7:05:02 PM, you wrote: C> I have a setup with a T2000 SAN attached to 90 500GB SATA drives C> presented as individual luns to the host. We will be sending mostly C> large streaming writes to the filesystems over the network (~2GB/file) C> in 5/6 streams per filesystem. Data protection is pretty important, but C> we need to have at most 25% overhead for redundancy. C> Some options I''m considering are: C> 10 x 7+2 RAIDZ2 w/ no hotspares C> 7 x 10+2 RAIDZ2 w/ 6 spares C> Does any one have advice relating to the performance or reliability to C> either of these? We typically would swap out a bad drive in 4-6 hrs and C> we expect the drives to be fairly full most of the time ~70-75% fs C> utilization. On x4500 with a config: 4x 9+2 RAID-z2 I get ~600MB/s logical (~700-800 with redundancy overhead). It''s somewhat jumpy but it''s a known bug in zfs... So in your config, assuming host/SAN/array is not a bottleneck, you should be able to write at least two times more throughput. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
przemolicc at poczta.fm
2007-Mar-12 07:50 UTC
[zfs-discuss] Layout for multiple large streaming writes.
On Sat, Mar 10, 2007 at 12:08:22AM +0100, Robert Milkowski wrote:> Hello Carisdad, > > Friday, March 9, 2007, 7:05:02 PM, you wrote: > > C> I have a setup with a T2000 SAN attached to 90 500GB SATA drives > C> presented as individual luns to the host. We will be sending mostly > C> large streaming writes to the filesystems over the network (~2GB/file) > C> in 5/6 streams per filesystem. Data protection is pretty important, but > C> we need to have at most 25% overhead for redundancy. > > C> Some options I''m considering are: > C> 10 x 7+2 RAIDZ2 w/ no hotspares > C> 7 x 10+2 RAIDZ2 w/ 6 spares > > C> Does any one have advice relating to the performance or reliability to > C> either of these? We typically would swap out a bad drive in 4-6 hrs and > C> we expect the drives to be fairly full most of the time ~70-75% fs > C> utilization. > > On x4500 with a config: 4x 9+2 RAID-z2 I get ~600MB/s logical > (~700-800 with redundancy overhead). It''s somewhat jumpy but it''s a > known bug in zfs... > So in your config, assuming host/SAN/array is not a bottleneck, > you should be able to write at least two times more throughput.Look also at: http://sunsolve.sun.com/search/document.do?assetkey=1-9-88385-1 where you have a way to increase the I/O and application performance on T2000. przemol ---------------------------------------------------------------------- Jestes kierowca? To poczytaj! >>> http://link.interia.pl/f199e
Robert Milkowski
2007-Mar-12 08:34 UTC
[zfs-discuss] Layout for multiple large streaming writes.
Hello przemolicc, Monday, March 12, 2007, 8:50:57 AM, you wrote: ppf> On Sat, Mar 10, 2007 at 12:08:22AM +0100, Robert Milkowski wrote:>> Hello Carisdad, >> >> Friday, March 9, 2007, 7:05:02 PM, you wrote: >> >> C> I have a setup with a T2000 SAN attached to 90 500GB SATA drives >> C> presented as individual luns to the host. We will be sending mostly >> C> large streaming writes to the filesystems over the network (~2GB/file) >> C> in 5/6 streams per filesystem. Data protection is pretty important, but >> C> we need to have at most 25% overhead for redundancy. >> >> C> Some options I''m considering are: >> C> 10 x 7+2 RAIDZ2 w/ no hotspares >> C> 7 x 10+2 RAIDZ2 w/ 6 spares >> >> C> Does any one have advice relating to the performance or reliability to >> C> either of these? We typically would swap out a bad drive in 4-6 hrs and >> C> we expect the drives to be fairly full most of the time ~70-75% fs >> C> utilization. >> >> On x4500 with a config: 4x 9+2 RAID-z2 I get ~600MB/s logical >> (~700-800 with redundancy overhead). It''s somewhat jumpy but it''s a >> known bug in zfs... >> So in your config, assuming host/SAN/array is not a bottleneck, >> you should be able to write at least two times more throughput.ppf> Look also at: ppf> http://sunsolve.sun.com/search/document.do?assetkey=1-9-88385-1 ppf> where you have a way to increase the I/O and application performance on ppf> T2000. I was talking about x4500 not T2000. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
przemolicc at poczta.fm
2007-Mar-12 09:08 UTC
[zfs-discuss] Layout for multiple large streaming writes.
On Mon, Mar 12, 2007 at 09:34:22AM +0100, Robert Milkowski wrote:> Hello przemolicc, > > Monday, March 12, 2007, 8:50:57 AM, you wrote: > > ppf> On Sat, Mar 10, 2007 at 12:08:22AM +0100, Robert Milkowski wrote: > >> Hello Carisdad, > >> > >> Friday, March 9, 2007, 7:05:02 PM, you wrote: > >> > >> C> I have a setup with a T2000 SAN attached to 90 500GB SATA drives > >> C> presented as individual luns to the host. We will be sending mostly > >> C> large streaming writes to the filesystems over the network (~2GB/file) > >> C> in 5/6 streams per filesystem. Data protection is pretty important, but > >> C> we need to have at most 25% overhead for redundancy. > >> > >> C> Some options I''m considering are: > >> C> 10 x 7+2 RAIDZ2 w/ no hotspares > >> C> 7 x 10+2 RAIDZ2 w/ 6 spares > >> > >> C> Does any one have advice relating to the performance or reliability to > >> C> either of these? We typically would swap out a bad drive in 4-6 hrs and > >> C> we expect the drives to be fairly full most of the time ~70-75% fs > >> C> utilization. > >> > >> On x4500 with a config: 4x 9+2 RAID-z2 I get ~600MB/s logical > >> (~700-800 with redundancy overhead). It''s somewhat jumpy but it''s a > >> known bug in zfs... > >> So in your config, assuming host/SAN/array is not a bottleneck, > >> you should be able to write at least two times more throughput. > > ppf> Look also at: > ppf> http://sunsolve.sun.com/search/document.do?assetkey=1-9-88385-1 > ppf> where you have a way to increase the I/O and application performance on > ppf> T2000. > > > I was talking about x4500 not T2000.But Carisdad mentioned T2000. Regards przemol ---------------------------------------------------------------------- Jestes kierowca? To poczytaj! >>> http://link.interia.pl/f199e
Richard Elling
2007-Mar-12 23:24 UTC
[zfs-discuss] Re: Layout for multiple large streaming writes.
> I have a setup with a T2000 SAN attached to 90 500GB SATA drives > presented as individual luns to the host. We will be sending mostly > large streaming writes to the filesystems over the network (~2GB/file) > in 5/6 streams per filesystem. Data protection is pretty important, but > we need to have at most 25% overhead for redundancy. > > Some options I''m considering are: > 10 x 7+2 RAIDZ2 w/ no hotspares > 7 x 10+2 RAIDZ2 w/ 6 spares > > Does any one have advice relating to the performance or reliability to > either of these? We typically would swap out a bad drive in 4-6 hrs and > we expect the drives to be fairly full most of the time ~70-75% fs > utilization.What drive manufacturer & model? What is the SAN configuration? More nodes on a loop can significantly reduce performance as loop arbitration begins to dominate. This problem can be reduced by using multiple loops or switched fabric, assuming the drives support fabrics. The data availability should be pretty good with raidz2. Having hot spares will be better than not, but with a 4-6 hour (assuming 24x7 operations) replacement time there isn''t an overwhelming need for hot spares -- double parity and fast repair time is a good combination. We do worry more about spares when the operations are not managed 24x7 or if you wish to save money by deferring repairs to a regularly scheduled service window. In my blog about this, I used a 24 hour logistical response time and see about an order of magnitude difference in the MTTDL. http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance In general, you will have better performance with more sets, so the 10-set config will outperform the 7-set config. -- richard This message posted from opensolaris.org
Thanks to everyone who replied to my question. Your input is very helpful. To clarify, I was concerned more with MTTDL than performance. With either of the 7+2 or 10+2 layouts, I am able to achieve far more throughput than is available to me via the network. Doing tests from memory on the system I can push > 550MB/s to the drives, but as of now I only have a 1Gb/s network interface on the box. I should be able to add another 1Gb/s link shortly, but that is still far less than I can drive the disks to. The major concern was weighting the increased probability of data loss given more drives in the raid set versus having spares available in the array given a 4-6hr drive replacement window 24x7. For Richard, The drives are Seagate 500GB SATA drives (Not sure of the exact model), in an EMC Clariion CX3 enclosure. There are 6 shelves of 15 drives, with each drive presented as a raw lun to the server. They are attached to a pair of dedicated 4Gb/s fabrics. It was interesting to test the 7+2 and 10+2 layouts w/ zfs versus a 3+1 hardware RAID running on the array. Using hardware RAID we saw a ~2% performance improvement. But we figured the improved MTTDL and being able to discover/recover from write/read errors with ZFS was well worth the 2% difference. One last question, the link from przemol (http://sunsolve.sun.com/search/document.do?assetkey=1-9-88385-1) references a qlc.conf parameter, but we are running Emulex cards (emlxs driver), is there similar tuning that can be done with those? Thanks again! -Andy
Erblichs
2007-Mar-13 19:32 UTC
[zfs-discuss] Re: Layout for multiple large streaming writes.
Tp the original poster, FYI, Accessing RAID drives at a constant "~70-75%" does not probably leave enough excess for degraded mode. A normal rule of thumb is 50 to 60% constant to allow excess capacity to be absorbed in degraded mode. An "old" rule of thumb for determining for estimating MTBF is if you have 100 drives and the single drive is estimated at 30,000 hours (> 3years).. Then the expected failure will occur in about 1 day/30 hours. Thus, excess capacity needs to be always present to allow the time to reconstruct the raid, ability to reconstuct it within a limited timeframe and to minimize any significantly increased latencies for normal processing. Mitchell Erblich ----------------- Richard Elling wrote:> > > I have a setup with a T2000 SAN attached to 90 500GB SATA drives > > presented as individual luns to the host. We will be sending mostly > > large streaming writes to the filesystems over the network (~2GB/file) > > in 5/6 streams per filesystem. Data protection is pretty important, but > > we need to have at most 25% overhead for redundancy. > > > > Some options I''m considering are: > > 10 x 7+2 RAIDZ2 w/ no hotspares > > 7 x 10+2 RAIDZ2 w/ 6 spares > > > > Does any one have advice relating to the performance or reliability to > > either of these? We typically would swap out a bad drive in 4-6 hrs and > > we expect the drives to be fairly full most of the time ~70-75% fs > > utilization. > > What drive manufacturer & model? > What is the SAN configuration? More nodes on a loop can significantly > reduce performance as loop arbitration begins to dominate. This problem > can be reduced by using multiple loops or switched fabric, assuming the > drives support fabrics. > > The data availability should be pretty good with raidz2. Having hot spares > will be better than not, but with a 4-6 hour (assuming 24x7 operations) > replacement time there isn''t an overwhelming need for hot spares -- double > parity and fast repair time is a good combination. We do worry more > about spares when the operations are not managed 24x7 or if you wish > to save money by deferring repairs to a regularly scheduled service > window. In my blog about this, I used a 24 hour logistical response > time and see about an order of magnitude difference in the MTTDL. > http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance > > In general, you will have better performance with more sets, so the > 10-set config will outperform the 7-set config. > -- richard > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss