Edward Ned Harvey
2010-Feb-13 14:06 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like "striping mirrors is faster than raidz" and so on. Would anybody like me to test any particular configuration? Unfortunately I don''t have any SSD, so I can''t do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5" SAS SSD they wouldn''t mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren''t all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Configurations being tested: . Single disk . 2-way mirror . 3-way mirror . 4-way mirror . 5-way mirror . 6-way mirror . Two mirrors striped (or concatenated) . Three mirrors striped (or concatenated) . 5-disk raidz . 6-disk raidz . 6-disk raidz2 Hypothesized results: . N-way mirrors write at the same speed of a single disk . N-way mirrors read n-times faster than a single disk . Two mirrors striped read and write 2x faster than a single mirror . Three mirrors striped read and write 3x faster than a single mirror . Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it''s slower than a single disk. Waiting to see the results. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100213/fc6f888d/attachment.html>
Richard Elling
2010-Feb-13 15:54 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
Some thoughts below... On Feb 13, 2010, at 6:06 AM, Edward Ned Harvey wrote:> I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like ?striping mirrors is faster than raidz? and so on. Would anybody like me to test any particular configuration? Unfortunately I don?t have any SSD, so I can?t do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5? SAS SSD they wouldn?t mind lending for a few hours. ;-) > > My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren?t all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with.Put the memory back in and limit the ARC cache size instead. x86 boxes have a tendency to change the memory bus speed depending on how much memory is in the box. Similarly, you can test primarycache settings rather than just limiting ARC size.> I am currently running the following tests: > > Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. > iozone -Reab somefile.wks -g 17G -i 1 -i 0IMHO, sequential tests are a waste of time. With default configs, it will be difficult to separate the "raw" performance from prefetched performance. You might try disabling prefetch as an option. With sync writes, you will run into the zfs_immediate_write_sz boundary. Perhaps someone else can comment on how often they find interesting sequential workloads which aren''t backup-related.> Configurations being tested: > ? Single disk > ? 2-way mirror > ? 3-way mirror > ? 4-way mirror > ? 5-way mirror > ? 6-way mirror > ? Two mirrors striped (or concatenated) > ? Three mirrors striped (or concatenated) > ? 5-disk raidz > ? 6-disk raidz > ? 6-disk raidz2Please add some raidz3 tests :-) We have little data on how raidz3 performs.> > Hypothesized results: > ? N-way mirrors write at the same speed of a single disk > ? N-way mirrors read n-times faster than a single disk > ? Two mirrors striped read and write 2x faster than a single mirror > ? Three mirrors striped read and write 3x faster than a single mirror > ? Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it?s slower than a single disk. Waiting to see the results.Please post results (with raw data would be nice ;-). If you would be so kind as to collect samples of "iosnoop -Da" I would be eternally grateful :-) -- richard
Bob Friesenhahn
2010-Feb-13 15:55 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Edward Ned Harvey wrote:> > Will test, including the time to flush(), various record sizes inside file sizes up to 16G, > sequential write and sequential read.? Not doing any mixed read/write requests.? Not doing any > random read/write. > > iozone -Reab somefile.wks -g 17G -i 1 -i 0Make sure to also test with a command like iozone -m -t 8 -T -O -r 128k -o -s 12G I am eager to read your test report. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn
2010-Feb-13 16:39 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Bob Friesenhahn wrote:> > Make sure to also test with a command like > > iozone -m -t 8 -T -O -r 128k -o -s 12GActually, it seems that this is more than sufficient: iozone -m -t 8 -T -r 128k -o -s 4G since it creates a 4GB test file for each thread, with 8 threads. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Edward Ned Harvey
2010-Feb-13 18:54 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> IMHO, sequential tests are a waste of time. With default configs, it> will be> difficult to separate the "raw" performance from prefetched> performance.> You might try disabling prefetch as an option.Let me clarify: Iozone does a nonsequential series of sequential tests, specifically for the purpose of identifying the performance tiers, separating the various levels of hardware accelerated performance from the raw disk performance. This is the reason why I took out all but 4G of the system RAM. In the (incomplete) results I have so far, it''s easy to see these tiers for a single disk: . For filesizes 0 to 4M, a single disk writes 2.8 Gbit/sec and reads ~40-60 Gbit/sec. This boost comes from writing to PERC cache, and reading from CPU L2 cache. . For filesizes 4M to 128M, a single disk writes 2.8 Gbit/sec and reads 24 Gbit/sec. This boost comes from writing to PERC cache, and reading from system memory. . For filesizes 128M to 4G, a single disk writes 1.2 Gbit/sec and reads 24 Gbit/sec. This boost comes from reading system memory. . For filesizes 4G to 16G, a single disk writes 1.2 Gbit/sec and reads 1.2 Gbit/sec This is the raw disk performance. (SAS, 15krpm, 146G disks)> Please add some raidz3 tests :-) We have little data on how raidz3> performs.Does this require a specific version of OS? I''m on Solaris 10 10/09, and "man zpool" doesn''t seem to say anything about raidz3 ... I haven''t tried using it ... does it exist?> Please post results (with raw data would be nice ;-). If you would be> so> kind as to collect samples of "iosnoop -Da" I would be eternally> grateful :-)I''m guessing iosnoop is an opensolaris thing? Is there an equivalent for solaris? I''ll post both the raw results, and my simplified conclusions. Most people would not want the raw data. Most people just want to know "What''s the performance hit I take by using raidz2 instead of raidz?" and so on. Or ... "What''s faster, raidz, or hardware raid-5?" -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100213/56ae5d09/attachment.html>
Bob Friesenhahn
2010-Feb-13 20:19 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Edward Ned Harvey wrote:> > > kind as to collect samples of "iosnoop -Da" I would be eternally > > grateful :-) > > I''m guessing iosnoop is an opensolaris thing?? Is there an equivalent for solaris?Iosnoop is part of the DTrace Toolkit by Brendan Gregg, which does work on Solaris 10. See "http://www.brendangregg.com/dtrace.html". Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2010-Feb-13 23:09 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Feb 13, 2010, at 10:54 AM, Edward Ned Harvey wrote:> > Please add some raidz3 tests :-) We have little data on how raidz3 > > performs. > > Does this require a specific version of OS? I''m on Solaris 10 10/09, and "man zpool" doesn''t seem to say anything about raidz3 ... I haven''t tried using it ... does it exist?Never mind. I have no interest in performance tests for Solaris 10. The code is so old, that it does not represent current ZFS at all. IMHO, if you want to do performance tests, then you need to be on the very latest dev release. Otherwise, the results can''t be carried forward to make a difference -- finding performance issues that are already fixed isn''t a good use of your time. -- richard
Edward Ned Harvey
2010-Feb-15 02:30 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> Never mind. I have no interest in performance tests for Solaris 10.> The code is so old, that it does not represent current ZFS at all.Whatever. Regardless of what you say, it does show: . Which is faster, raidz, or a stripe of mirrors? . How much does raidz2 hurt performance compared to raidz? . Which is faster, raidz, or hardware raid 5? . Is a mirror twice as fast as a single disk for reading? Is a 3-way mirror 3x faster? And so on? I''ve seen and heard many people stating answers to these questions, and my results (not yet complete) already answer these questions, and demonstrate that all the previous assertions were partial truths. It''s true, I am demonstrating no interest to compare performance of ZFS 3 versus ZFS 4. If you want that, test it yourself and don''t complain about my tests. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100214/48472e24/attachment.html>
Edward Ned Harvey
2010-Feb-15 02:40 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> > iozone -m -t 8 -T -O -r 128k -o -s 12G > > Actually, it seems that this is more than sufficient: > > iozone -m -t 8 -T -r 128k -o -s 4GGood news, cuz I kicked off the first test earlier today, and it seems like it will run till Wednesday. ;-) The first run, on a single disk, took 6.5 hrs, and I have it configured to repeat ... 2-way mirror, 3-way mirror, 4-way mirror, 5-way mirror, raidz 5 disks, raidz 6 disks, raidz2 6 disks, stripe of 2 mirrors, stripe of 3 mirrors ... I''ll go stop it, and change to 4G. Maybe it''ll be done tomorrow. ;-)
Thomas Burgess
2010-Feb-15 02:45 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> Whatever. Regardless of what you say, it does show: > > ? Which is faster, raidz, or a stripe of mirrors? > > ? How much does raidz2 hurt performance compared to raidz? > > ? Which is faster, raidz, or hardware raid 5? > > ? Is a mirror twice as fast as a single disk for reading? Is a > 3-way mirror 3x faster? And so on? > > > > I?ve seen and heard many people stating answers to these questions, and my > results (not yet complete) already answer these questions, and demonstrate > that all the previous assertions were partial truths. > > >I don''t think he was complaining, i think he was sayign he dind''t need you to run iosnoop on the old version of ZFS. Solaris 10 has a really old version of ZFS. i know there are some pretty big differences in zfs versions from my own non scientific benchmarks. It would make sense that people wouldn''t be as interested in benchmarks of solaris 10 ZFS seeing as there are literally hundreds scattered around the internet. I don''t think he was telling you not to bother testing for your own purposes though. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100214/df880a86/attachment.html>
Bob Friesenhahn
2010-Feb-15 02:47 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Sun, 14 Feb 2010, Edward Ned Harvey wrote:> > Never mind. I have no interest in performance tests for Solaris 10. > > > The code is so old, that it does not represent current ZFS at all. > > Whatever.? Regardless of what you say, it does show:Since Richard abandoned Sun (in favor of gmail), he has no qualms with suggesting to test the unstable version. ;-) Regardless of denials to the contrary, Solaris 10 is still the stable enterprise version of Solaris, and will be for quite some time. It has not yet achieved the status of Solaris 8. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn
2010-Feb-15 02:50 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Sun, 14 Feb 2010, Edward Ned Harvey wrote:>>> iozone -m -t 8 -T -O -r 128k -o -s 12G >> >> Actually, it seems that this is more than sufficient: >> >> iozone -m -t 8 -T -r 128k -o -s 4G > > Good news, cuz I kicked off the first test earlier today, and it seems like > it will run till Wednesday. ;-) The first run, on a single disk, took 6.5 > hrs, and I have it configured to repeat ... 2-way mirror, 3-way mirror, > 4-way mirror, 5-way mirror, raidz 5 disks, raidz 6 disks, raidz2 6 disks, > stripe of 2 mirrors, stripe of 3 mirrors ... > > I''ll go stop it, and change to 4G. Maybe it''ll be done tomorrow. ;-)Probably even 2G is plenty since that gives 16GB of total file data. Keep in mind that with file data much larger than memory, these benchmarks are testing the hardware more than they are testing Solaris. If you wanted to test Solaris, then you would intentionally give it enough memory to work with since that is now it is expected to be used. The performance of Solaris when it is given enough memory to do reasonable caching is astounding. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn
2010-Feb-15 02:57 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Sun, 14 Feb 2010, Thomas Burgess wrote:> > Solaris 10 has a really old version of ZFS.? i know there are some > pretty big differences in zfs versions from my own non scientific > benchmarks.? It would make sense that people wouldn''t be as > interested in benchmarks of solaris 10 ZFS seeing as there are > literally hundreds scattered around the internet.Can you provide URLs for these useful benchmarks? I am certainly interested in seeing them. Even my own benchmarks that I posted almost two years ago are quite useless now. Solaris 10 ZFS is a continually moving target. OpenSolaris performance postings I have seen are not terribly far from Solaris 10. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2010-Feb-15 05:37 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Feb 14, 2010, at 6:45 PM, Thomas Burgess wrote:> > Whatever. Regardless of what you say, it does show: > > ? Which is faster, raidz, or a stripe of mirrors? > > ? How much does raidz2 hurt performance compared to raidz? > > ? Which is faster, raidz, or hardware raid 5? > > ? Is a mirror twice as fast as a single disk for reading? Is a 3-way mirror 3x faster? And so on? > > > I?ve seen and heard many people stating answers to these questions, and my results (not yet complete) already answer these questions, and demonstrate that all the previous assertions were partial truths. > > > > I don''t think he was complaining, i think he was sayign he dind''t need you to run iosnoop on the old version of ZFS.iosnoop runs fine on Solaris 10. I am sorta complaining, though. If you wish to advance ZFS, then use the latest bits. If you wish to discover the performance bugs in Solaris 10 that are already fixed in OpenSolaris, then go ahead, be my guest. Examples of improvements are: + intelligent prefetch algorithm is smarter + txg commit interval logic is improved + ZIL logic improved and added logbias property + stat() performance is improved + raidz write performance improved and raidz3 added + zfs caching improved + dedup changes touched many parts of ZFS + zfs_vdev_max_pending reduced and smarter + metaslab allocation improved + zfs write activity doesn''t hog resource quite so much + a new scheduling class, SDC, added to better observe and manage ZFS thread scheduling + buffers can be shared between file system modules (fewer copies) As you can see, so much has changed, hopefully for the better, that running performance benchmarks on old software just isn''t very interesting. NB. Oracle''s Sun OpenStorage systems do not use Solaris 10 and if they did, they would not be competitive in the market. The notion that OpenSolaris is worthless and Solaris 10 rules is simply bull*> Solaris 10 has a really old version of ZFS. i know there are some pretty big differences in zfs versions from my own non scientific benchmarks. It would make sense that people wouldn''t be as interested in benchmarks of solaris 10 ZFS seeing as there are literally hundreds scattered around the internet. > > I don''t think he was telling you not to bother testing for your own purposes though.Correct. -- richard
Carson Gaspar
2010-Feb-15 09:17 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
Richard Elling wrote: ...> As you can see, so much has changed, hopefully for the better, that running > performance benchmarks on old software just isn''t very interesting. > > NB. Oracle''s Sun OpenStorage systems do not use Solaris 10 and if they did, they > would not be competitive in the market. The notion that OpenSolaris is worthless > and Solaris 10 rules is simply bull*OpenSolaris isn''t worthless, but no way in hell would I run it in production, based on my experiences running it at home from b111 to now. The mpt driver problems are just one of many show stoppers (is that resolved yet, or do we still need magic /etc/system voodoo?). Of course, Solaris 10 couldn''t properly drive the Marvell attached disks in an X4500 prior to U6 either, unless you ran an IDR (pretty inexcusable in a storage-centric server release). -- Carson
Edward Ned Harvey
2010-Feb-18 13:08 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
Ok, I''ve done all the tests I plan to complete. For highest performance, it seems: . The measure I think is the most relevant for typical operation is the fastest random read /write / mix. (Thanks Bob, for suggesting I do this test.) The winner is clearly striped mirrors in ZFS . The fastest sustained sequential write is striped mirrors via ZFS, or maybe raidz . The fastest sustained sequential read is striped mirrors via ZFS, or maybe raidz Here are the results: . Results summary of Bob''s method http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary. pdf . Raw results of Bob''s method http://nedharvey.com/iozone_weezer/bobs%20method/raw_results.zip . Results summary of Ned''s method http://nedharvey.com/iozone_weezer/neds%20method/iozone%20results%20summary. pdf . Raw results of Ned''s method http://nedharvey.com/iozone_weezer/neds%20method/raw_results.zip From: Edward Ned Harvey [mailto:solaris at nedharvey.com] Sent: Saturday, February 13, 2010 9:07 AM To: opensolaris-discuss at opensolaris.org; zfs-discuss at opensolaris.org Subject: ZFS performance benchmarks in various configurations I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like "striping mirrors is faster than raidz" and so on. Would anybody like me to test any particular configuration? Unfortunately I don''t have any SSD, so I can''t do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5" SAS SSD they wouldn''t mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren''t all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Configurations being tested: . Single disk . 2-way mirror . 3-way mirror . 4-way mirror . 5-way mirror . 6-way mirror . Two mirrors striped (or concatenated) . Three mirrors striped (or concatenated) . 5-disk raidz . 6-disk raidz . 6-disk raidz2 Hypothesized results: . N-way mirrors write at the same speed of a single disk . N-way mirrors read n-times faster than a single disk . Two mirrors striped read and write 2x faster than a single mirror . Three mirrors striped read and write 3x faster than a single mirror . Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it''s slower than a single disk. Waiting to see the results. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100218/bce40b5f/attachment.html>
Bob Friesenhahn
2010-Feb-18 17:10 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Thu, 18 Feb 2010, Edward Ned Harvey wrote:> > Ok, I?ve done all the tests I plan to complete.? For highest performance, it seems: > > ????????? The measure I think is the most relevant for typical operation is the fastest random read > /write / mix.? (Thanks Bob, for suggesting I do this test.) > The winner is clearly striped mirrors in ZFSA most excellent set of tests. We could use some units in the PDF file though. While it would take quite some time and effort to accomplish, we could use a similar summary for full disk resilver times in each configuration.> ????????? The fastest sustained sequential write is striped mirrors via ZFS, or maybe raidzNote that while these tests may be file-sequential, with 8 threads working at once, what the disks see is not necessarily sequential. However, for initial "sequential" write, it may be that zfs aggregates the write requests and orders them on disk in such a way that subsequent "sequential" reads by the name number of threads in a roughly similar order would see a performance benefit. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Edward Ned Harvey
2010-Feb-19 03:28 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> A most excellent set of tests. We could use some units in the PDF > file though.Oh, hehehe. ;-) The units are written in the raw txt files. On your tests, the units were ops/sec, and in mine, they were Kbytes/sec. If you like, you can always grab the xlsx and modify it to your tastes, and create an updated pdf. Just substitute .xlsx instead of .pdf in the previous URL''s. Or just drop the filename off the URL. My web server allows indexing on that directory. Personally, I only look at the chart which is normalized against a single disk, so units are intentionally not present.> While it would take quite some time and effort to accomplish, we could > use a similar summary for full disk resilver times in each > configuration.Actually, that''s easy. Although the "zpool create" happens instantly, all the hardware raid configurations required an initial resilver. And they were exactly what you expect. Write 1 Gbit/s until you reach the size of the drive. I watched the progress while I did other things, and it was incredibly consistent. I am assuming, with very high confidence, that ZFS would match that performance.
Edward Ned Harvey
2010-Feb-19 03:53 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> A most excellent set of tests. We could use some units in the PDF > file though.Oh, by the way, you originally requested the 12G file to be used in benchmark, and later changed to 4G. But by that time, two of the tests had already completed on the 12G, and I didn''t throw away those results, but I didn''t include them in the summary either. If you look in the raw results, you''ll see a directory called 12G, and if you compare those results against the equivalent 4G counterpart, you''ll see the 12G in fact performed somewhat lower. The reason is that there are sometimes cache hits during read operations, and the write back buffer is enabled in the PERC. So the smaller the data set, the more frequently these things will accelerate you. And consequently, the 4G performance was measured higher. This doesn''t affect me at all. I wanted to know qualitative results, not quantitative.
Bob Friesenhahn
2010-Feb-19 04:39 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Thu, 18 Feb 2010, Edward Ned Harvey wrote:> Actually, that''s easy. Although the "zpool create" happens instantly, all > the hardware raid configurations required an initial resilver. And they > were exactly what you expect. Write 1 Gbit/s until you reach the size of > the drive. I watched the progress while I did other things, and it was > incredibly consistent.This sounds like an initial ''silver'' rather than a ''resilver''. In a ''resilver'' process it is necessary to read other disks in the vdev in order to reconstruct the disk content. As a result, we now have additional seeks and reads going on, which seems considerably different than pure writes. What I am interested in is the answer to these sort of questions: o Does a mirror device resilver faster than raidz? o Does a mirror device in a triple mirror resilver faster than a two-device mirror? o Does a raidz2 with 9 disks resilver faster or slower than one with 6 disks? The answer to these questions could vary depending on how well the pool has been "aged" and if it has been used for a while close to 100% full. Before someone pipes up and says that measuring this is useless since results like this are posted all over the internet, I challenge that someone to find this data already published somewhere. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Daniel Carosone
2010-Feb-19 05:46 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Thu, Feb 18, 2010 at 10:39:48PM -0600, Bob Friesenhahn wrote:> This sounds like an initial ''silver'' rather than a ''resilver''.Yes, in particular it will be entirely seqential. ZFS resilver is in txg order and involves seeking.> What I am interested in is the answer to these sort of questions: > > o Does a mirror device resilver faster than raidz? > > o Does a mirror device in a triple mirror resilver faster than a > two-device mirror? > > o Does a raidz2 with 9 disks resilver faster or slower than one with > 6 disks?and, if we''re wishing for comprehensive analysis: o What is the impact on concurrent IO benchmark loads, for each of the above.> The answer to these questions could vary depending on how well the pool > has been "aged" and if it has been used for a while close to 100% full.Indeed, which makes it even harder to compare results from different cases and test sources. To get usable relative-to-each-other results, one needs to compare idealised test cases with repeatable loads. This is weeks of work, at least, and can be fun to specualte about up front but rapidly gets very tiresome. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100219/715a082c/attachment.bin>
Edward Ned Harvey
2010-Feb-19 16:35 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
One more thing I?d like to add here: The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine. It may be smart to double check, and ensure your OS does adaptive readahead. In Linux (rhel/centos) you can check that the ?readahead? service is loading. I noticed this is enabled by default in runlevel 5, but disabled by default in runlevel 3. Interesting. I don?t know how to check solaris or opensolaris, to ensure adaptive readahead is enabled. On 2/18/10 8:08 AM, "Edward Ned Harvey" <solaris at nedharvey.com> wrote:> Ok, I?ve done all the tests I plan to complete. For highest performance, it > seems: > ? The measure I think is the most relevant for typical operation is the > fastest random read /write / mix. (Thanks Bob, for suggesting I do this > test.) > The winner is clearly striped mirrors in ZFS > > ? The fastest sustained sequential write is striped mirrors via ZFS, or > maybe raidz > > ? The fastest sustained sequential read is striped mirrors via ZFS, or > maybe raidz > > > Here are the results: > ? Results summary of Bob''s method >http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.pd> f> > ? Raw results of Bob''s method > http://nedharvey.com/iozone_weezer/bobs%20method/raw_results.zip > > ? Results summary of Ned''s method >http://nedharvey.com/iozone_weezer/neds%20method/iozone%20results%20summary.pd> f> > ? Raw results of Ned''s method > http://nedharvey.com/iozone_weezer/neds%20method/raw_results.zip > > > > > > > > From: Edward Ned Harvey [mailto:solaris at nedharvey.com] > Sent: Saturday, February 13, 2010 9:07 AM > To: opensolaris-discuss at opensolaris.org; zfs-discuss at opensolaris.org > Subject: ZFS performance benchmarks in various configurations > > I have a new server, with 7 disks in it. I am performing benchmarks on it > before putting it into production, to substantiate claims I make, like > ?striping mirrors is faster than raidz? and so on. Would anybody like me to > test any particular configuration? Unfortunately I don?t have any SSD, so I > can?t do any meaningful test on the ZIL etc. Unless someone in the Boston > area has a 2.5? SAS SSD they wouldn?t mind lending for a few hours. ;-) > > My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I > pulled it all out to get it down to 4G of ram. (Easier to benchmark disks > when the file operations aren?t all cached.) ;-) Solaris 10 10/09. PERC 6/i > controller. All disks are configured in PERC for Adaptive ReadAhead, and > Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 > disk, so I have 6 disks to play with. > > I am currently running the following tests: > > Will test, including the time to flush(), various record sizes inside file > sizes up to 16G, sequential write and sequential read. Not doing any mixed > read/write requests. Not doing any random read/write. > iozone -Reab somefile.wks -g 17G -i 1 -i 0 > > Configurations being tested: > ? Single disk > > ? 2-way mirror > > ? 3-way mirror > > ? 4-way mirror > > ? 5-way mirror > > ? 6-way mirror > > ? Two mirrors striped (or concatenated) > > ? Three mirrors striped (or concatenated) > > ? 5-disk raidz > > ? 6-disk raidz > > ? 6-disk raidz2 > > > Hypothesized results: > ? N-way mirrors write at the same speed of a single disk > > ? N-way mirrors read n-times faster than a single disk > > ? Two mirrors striped read and write 2x faster than a single mirror > > ? Three mirrors striped read and write 3x faster than a single mirror > > ? Raidz and raidz2: No hypothesis. Some people say they perform > comparable to many disks working together. Some people say it?s slower than a > single disk. Waiting to see the results. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100219/c108a9dc/attachment.html>
Günther
2010-Feb-19 17:25 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
hello i have made some benchmarks with my napp-it zfs-server<br> <a href="http://www.napp-it.org/bench.pdf" target="_blank">screenshot</a><br> <br> <a href="http://www.napp-it.org/bench.pdf" target="_blank">www.napp-it.org/bench.pdf</a><br> <br> -> 2gb vs 4 gb vs 8 gb ram<br> -> mirror vs raidz vs raidz2 vs raidz3<br> -> dedup and compress enabled vs disabled<br> <br> result in short:<br> 8gb ram vs 2 Gb: + 10% .. +500% more power (green drives)<br> compress and dedup enabled: + 50% .. +300%<br> mirror vs Raidz: fastest is raidz, slowest mirror, raidz level +/-20%<br> <br> <br> gea -- This message posted from opensolaris.org
Richard Elling
2010-Feb-19 21:30 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On Feb 19, 2010, at 8:35 AM, Edward Ned Harvey wrote:> One more thing I?d like to add here: > > The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine. > > It may be smart to double check, and ensure your OS does adaptive readahead. > In Linux (rhel/centos) you can check that the ?readahead? service is loading. I noticed this is enabled by default in runlevel 5, but disabled by default in runlevel 3. Interesting. > > I don?t know how to check solaris or opensolaris, to ensure adaptive readahead is enabled.ZFS has intelligent prefetching. AFAIK, Solaris disk drivers do not prefetch. -- richard
Ragnar Sundblad
2010-Feb-19 22:03 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
On 19 feb 2010, at 17.35, Edward Ned Harvey wrote:> The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine.If I understand correctly, ZFS now adays will only flush data to non volatile storage (such as a RAID controller NVRAM), and not all the way out to disks. (To solve performance problems with some storage systems, and I believe that it also is the right thing to do under normal circumstances.) Doesn''t this mean that if you enable write back, and you have a single, non-mirrored raid-controller, and your raid controller dies on you so that you loose the contents of the nvram, you have a potentially corrupt file system? /ragge
Neil Perrin
2010-Feb-19 22:30 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> If I understand correctly, ZFS now adays will only flush data to > non volatile storage (such as a RAID controller NVRAM), and not > all the way out to disks. (To solve performance problems with some > storage systems, and I believe that it also is the right thing > to do under normal circumstances.) > > Doesn''t this mean that if you enable write back, and you have > a single, non-mirrored raid-controller, and your raid controller > dies on you so that you loose the contents of the nvram, you have > a potentially corrupt file system?ZFS requires,that all writes be flushed to non-volatile storage. This is needed for both transaction group (txg) commits to ensure pool integrity and for the ZIL to satisfy the synchronous requirement of fsync/O_DSYNC etc. If the caches weren''t flushed then it would indeed be quicker but the pool would be susceptible to corruption. Sadly some hardware doesn''t honour cache flushes and this can cause corruption. Neil.
Edward Ned Harvey
2010-Feb-20 14:57 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> ZFS has intelligent prefetching. AFAIK, Solaris disk drivers do not > prefetch.Can you point me to any reference? I didn''t find anything stating yay or nay, for either of these.
Edward Ned Harvey
2010-Feb-20 15:09 UTC
[zfs-discuss] ZFS performance benchmarks in various configurations
> Doesn''t this mean that if you enable write back, and you have > a single, non-mirrored raid-controller, and your raid controller > dies on you so that you loose the contents of the nvram, you have > a potentially corrupt file system?It is understood, that any single point of failure could result in failure, yes. If you have a CPU that performs miscalculations, makes mistakes, it can instruct bad things to be written to disk (I''ve had something like that happen before.) If you have RAM with bit errors in it that go undetected, you can have corrupted memory, and if that memory is destined to write to disk, you''ll have bad data written to disk. If you have a non-redundant raid controller, which buffers writes, and the buffer gets destroyed or corrupted before the writes are put to disk, then the data has become corrupt. Heck, the same is true even with redundant raid controllers, if there are memory errors in one that go undetected. So you''ll have to do your own calculation. Which is worse? - Don''t get the benefit of accelerated hardware, for all the time that the hardware is functioning correctly, Or - Take the risk of acceleration, with possibility the accelerator could fail and cause harm to the data it was working on. I know I always opt for using the raid write-back. If I ever have a situation where I''m so scared of the raid card corrupting data, I would be equally scared of the CPU or SAS bus or system ram or whatever. In that case, I''d find a solution that makes entire machines redundant, rather than worrying about one little perc card. Yes it can happen. I''ve seen it happen. But not just to raid cards; everything else is vulnerable too. I''ll take a 4x performance improvement for 99.999% of the time, and risk the corruption the rest of the time.