Hello, We just purchased two of the sc847e26-rjbod1 units to be used in a storage environment running Solaris 11 express. We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS 9200-8e hba. We are not using failover/redundancy. Meaning that one port of the hba goes to the primary front backplane interface, and the other goes to the primary rear backplane interface. For testing, we have done the following: Installed 12 disks in the front, 0 in the back. Created a stripe of different numbers of disks. After each test, I destroy the underlying storage volume and create a new one. As you can see by the results, adding more disks, makes no difference to the performance. This should make a large difference from 4 disks to 8 disks, however no difference is shown. Any help would be greatly appreciated! This is the result: root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0 root at cm-srfe03:/home/gdurham~# sh createPool.sh 4 spares are: c0t5000CCA223C00A25d0 spares are: c0t5000CCA223C00B2Fd0 spares are: c0t5000CCA223C00BA6d0 spares are: c0t5000CCA223C00BB7d0 root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero of=/fooPool0/86gb.tst bs=4096 count=20971520 ^C3503681+0 records in 3503681+0 records out 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s real 0m39.396s user 0m1.791s sys 0m36.029s root at cm-srfe03:/home/gdurham~# root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0 root at cm-srfe03:/home/gdurham~# sh createPool.sh 6 spares are: c0t5000CCA223C00A25d0 spares are: c0t5000CCA223C00B2Fd0 spares are: c0t5000CCA223C00BA6d0 spares are: c0t5000CCA223C00BB7d0 spares are: c0t5000CCA223C02C22d0 spares are: c0t5000CCA223C009B9d0 root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero of=/fooPool0/86gb.tst bs=4096 count=20971520 ^C2298711+0 records in 2298711+0 records out 9415520256 bytes (9.4 GB) copied, 25.813 s, 365 MB/s real 0m25.817s user 0m1.171s sys 0m23.544s root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0 root at cm-srfe03:/home/gdurham~# sh createPool.sh 8 spares are: c0t5000CCA223C00A25d0 spares are: c0t5000CCA223C00B2Fd0 spares are: c0t5000CCA223C00BA6d0 spares are: c0t5000CCA223C00BB7d0 spares are: c0t5000CCA223C02C22d0 spares are: c0t5000CCA223C009B9d0 spares are: c0t5000CCA223C012B5d0 spares are: c0t5000CCA223C029AFd0 root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero of=/fooPool0/86gb.tst bs=4096 count=20971520 ^C6272342+0 records in 6272342+0 records out 25691512832 bytes (26 GB) copied, 70.4122 s, 365 MB/s real 1m10.433s user 0m3.187s sys 1m4.426s
On Tue, 9 Aug 2011, Gregory Durham wrote:> Hello, > We just purchased two of the sc847e26-rjbod1 units to be used in a > storage environment running Solaris 11 express. > > root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0 > root at cm-srfe03:/home/gdurham~# sh createPool.sh 4What is ''createPool.sh''? You really have not told us anything useful since we have no idea what your mystery script might be doing. All we can see is that something reports more spare disks as the argument is increased as if the argument is the number of spare disks to allocate. For all we know, it is always using the same number of data disks. You also failed to tell us how memory you have installed in the machine. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Tue, Aug 9, 2011 at 8:45 PM, Gregory Durham <gregory.durham at gmail.com> wrote:> For testing, we have done the following: > Installed 12 disks in the front, 0 in the back. > Created a stripe of different numbers of disks.So you are creating one zpool with one disk per vdev and varying the number of vdevs (the number of vdevs == the number of disks), there is NO redundancy ? Do you have compression enabled ? Do you have dedup enabled ? I expect the answer to both of the above is no given the test data is /dev/zero, although that would tend to be limited by your memory bandwidth (and if this is a modern server I would expect _much_ higher numbers if compression were on). What is the server hardware configuration ? You are testing sequential write access only, is this really what the application will be doing ?> After each test, I > destroy the underlying storage volume and create a new one. As you can > see by the results, adding more disks, makes no difference to the > performance. This should make a large difference from 4 disks to 8 > disks, however no difference is shown.Unless you are being limited by something else... What does `iostat -xn 1` show during the test ? There should be periods of zero activity and then huge peaks (as the transaction group is committed to disk). You are using a 4KB test data block size, is that realistic ? My experience is that ZFS performance with block sizes that small with the default "suggested recordsize" of 128K is not very good, try setting recordsize to 16K (zfs set recordsize=16k <poolname>) and see if you get different results. Try using a different tool (iozone is OK but the best I have found is filebench, but that takes a bit more to get useful data out of) instead of dd. Try a different test data block size. See https://spreadsheets.google.com/a/kraus-haus.org/spreadsheet/pub?hl=en_US&hl=en_US&key=0AtReWsGW-SB1dFB1cmw0QWNNd0RkR1ZnN0JEb2RsLXc&output=html for my experience changing configurations. I did not bother changing the total number of drives as that was already fixed by what we bought.> Any help would be greatly appreciated! > > This is the result: > > root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0 > root at cm-srfe03:/home/gdurham~# sh createPool.sh 4 > spares are: c0t5000CCA223C00A25d0 > spares are: c0t5000CCA223C00B2Fd0 > spares are: c0t5000CCA223C00BA6d0 > spares are: c0t5000CCA223C00BB7d0 > root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero > of=/fooPool0/86gb.tst bs=4096 count=20971520 > ^C3503681+0 records in > 3503681+0 records out > 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s > > > real ? ?0m39.396s > user ? ?0m1.791s > sys ? ? 0m36.029s > root at cm-srfe03:/home/gdurham~# > root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0 > root at cm-srfe03:/home/gdurham~# sh createPool.sh 6 > spares are: c0t5000CCA223C00A25d0 > spares are: c0t5000CCA223C00B2Fd0 > spares are: c0t5000CCA223C00BA6d0 > spares are: c0t5000CCA223C00BB7d0 > spares are: c0t5000CCA223C02C22d0 > spares are: c0t5000CCA223C009B9d0 > root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero > of=/fooPool0/86gb.tst bs=4096 count=20971520 > ^C2298711+0 records in > 2298711+0 records out > 9415520256?bytes (9.4 GB) copied, 25.813 s, 365 MB/s > > > real ? ?0m25.817s > user ? ?0m1.171s > sys ? ? 0m23.544s > root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0 > root at cm-srfe03:/home/gdurham~# sh createPool.sh 8 > spares are: c0t5000CCA223C00A25d0 > spares are: c0t5000CCA223C00B2Fd0 > spares are: c0t5000CCA223C00BA6d0 > spares are: c0t5000CCA223C00BB7d0 > spares are: c0t5000CCA223C02C22d0 > spares are: c0t5000CCA223C009B9d0 > spares are: c0t5000CCA223C012B5d0 > spares are: c0t5000CCA223C029AFd0 > root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero > of=/fooPool0/86gb.tst bs=4096 count=20971520 > ^C6272342+0 records in > 6272342+0 records out > 25691512832 bytes (26 GB) copied, 70.4122 s, 365 MB/s > > > real ? ?1m10.433s > user ? ?0m3.187s > sys ? ? 1m4.426s > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Designer: Frankenstein, A New Musical (http://www.facebook.com/event.php?eid=123170297765140) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham <gregory.durham at gmail.com> wrote:> Hello, > We just purchased two of the sc847e26-rjbod1 units to be used in a > storage environment running Solaris 11 express. > > We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS > 9200-8e hba. We are not using failover/redundancy. Meaning that one > port of the hba goes to the primary front backplane interface, and the > other goes to the primary rear backplane interface. > > For testing, we have done the following: > Installed 12 disks in the front, 0 in the back. > Created a stripe of different numbers of disks. After each test, I > destroy the underlying storage volume and create a new one. As you can > see by the results, adding more disks, makes no difference to the > performance. This should make a large difference from 4 disks to 8 > disks, however no difference is shown. > > Any help would be greatly appreciated! > > This is the result: > > root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero > of=/fooPool0/86gb.tst bs=4096 count=20971520 > ^C3503681+0 records in > 3503681+0 records out > 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/sSo, the problem here is that you''re not testing the storage at all. You''re basically measuring dd. To get meaningful results, you need to do two things: First, run it for long enough so you eliminate any write cache effects. Writes go to memory and only get sent to disk in the background. Second, use a proper benchmark suite, and one that isn''t itself a bottleneck. Something like vdbench, although there are others. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
I would generally agree that dd is not a great benchmarking tool, but you could use multiple instances to multiple files, and larger block sizes are more efficient. And it''s always good to check iostat and mpstat for io and cpu bottlenecks. Also note that an initial run that creates files may be quicker because it just allocates blocks, whereas subsequent rewrites require copy-on-write. ----- Reply message ----- From: "Peter Tribble" <peter.tribble at gmail.com> To: "Gregory Durham" <gregory.durham at gmail.com> Cc: <zfs-discuss at opensolaris.org> Subject: [zfs-discuss] Issues with supermicro Date: Wed, Aug 10, 2011 10:56 On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham <gregory.durham at gmail.com> wrote:> Hello, > We just purchased two of the sc847e26-rjbod1 units to be used in a > storage environment running Solaris 11 express. > > We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS > 9200-8e hba. We are not using failover/redundancy. Meaning that one > port of the hba goes to the primary front backplane interface, and the > other goes to the primary rear backplane interface. > > For testing, we have done the following: > Installed 12 disks in the front, 0 in the back. > Created a stripe of different numbers of disks. After each test, I > destroy the underlying storage volume and create a new one. As you can > see by the results, adding more disks, makes no difference to the > performance. This should make a large difference from 4 disks to 8 > disks, however no difference is shown. > > Any help would be greatly appreciated! > > This is the result: > > root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero > of=/fooPool0/86gb.tst bs=4096 count=20971520 > ^C3503681+0 records in > 3503681+0 records out > 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/sSo, the problem here is that you''re not testing the storage at all. You''re basically measuring dd. To get meaningful results, you need to do two things: First, run it for long enough so you eliminate any write cache effects. Writes go to memory and only get sent to disk in the background. Second, use a proper benchmark suite, and one that isn''t itself a bottleneck. Something like vdbench, although there are others. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110810/7cf56071/attachment.html>
I would generally agree that dd is not a great benchmarking tool, but you could use multiple instances to multiple files, and larger block sizes are more efficient. And it''s always good to check iostat and mpstat for io and cpu bottlenecks. Also note that an initial run that creates files may be quicker because it just allocates blocks, whereas subsequent rewrites require copy-on-write. ----- Reply message ----- From: "Peter Tribble" <peter.tribble at gmail.com> To: "Gregory Durham" <gregory.durham at gmail.com> Cc: <zfs-discuss at opensolaris.org> Subject: [zfs-discuss] Issues with supermicro Date: Wed, Aug 10, 2011 10:56 On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham <gregory.durham at gmail.com> wrote:> Hello, > We just purchased two of the sc847e26-rjbod1 units to be used in a > storage environment running Solaris 11 express. > > We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS > 9200-8e hba. We are not using failover/redundancy. Meaning that one > port of the hba goes to the primary front backplane interface, and the > other goes to the primary rear backplane interface. > > For testing, we have done the following: > Installed 12 disks in the front, 0 in the back. > Created a stripe of different numbers of disks. After each test, I > destroy the underlying storage volume and create a new one. As you can > see by the results, adding more disks, makes no difference to the > performance. This should make a large difference from 4 disks to 8 > disks, however no difference is shown. > > Any help would be greatly appreciated! > > This is the result: > > root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero > of=/fooPool0/86gb.tst bs=4096 count=20971520 > ^C3503681+0 records in > 3503681+0 records out > 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/sSo, the problem here is that you''re not testing the storage at all. You''re basically measuring dd. To get meaningful results, you need to do two things: First, run it for long enough so you eliminate any write cache effects. Writes go to memory and only get sent to disk in the background. Second, use a proper benchmark suite, and one that isn''t itself a bottleneck. Something like vdbench, although there are others. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110810/8f2f27d9/attachment-0001.html>
Hello All, Sorry for the lack of information. Here is some answers to some questions: 1) createPool.sh: essentially can take 2 params, one is number of disks in pool, the second is either blank or mirrored, blank means number of disks in the pool i.e. raid 0, mirrored makes 2 disk mirrors. #!/bin/sh disks=( `cat diskList | grep Hitachi | awk ''{print $2}'' | tr ''\n'' '' ''` ) #echo ${disks[1]} #$useDisks=" " for (( i = 0; i < $1; i++ )) do #echo "Thus far: "$useDisks if [ "$2" = "mirrored" ] then if [ $(($i % 2)) -eq 0 ] then useDisks="$useDisks mirror ${disks[i]}" else useDisks=$useDisks" "${disks[i]} fi else useDisks=$useDisks" "${disks[i]} fi if [ $(($i - $1)) -le 2 ] then echo "spares are: ${disks[i]}" fi done #echo $useDisks zpool create -f fooPool0 $useDisks 2) hardware: Each server attached to each storage array is a dell r710 with 32 GB memory each. To test for issues with another platform the below info, is from a dell 1950 server with 8GB memory. However, I see similar results from the r710s as well. 3) In order to deal with caching, I am writing larger amounts of data to the disk then I have memory for. 4) I have tested with bonnie++ as well and here are the results, i have read that it is best to test with 4x the amount of memory: /usr/local/sbin/bonnie++ -s 32000 -d /fooPool0/test -u gdurham Using uid:101, gid:10. Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start ''em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP cm-srfe03 32000M 230482 97 477644 76 223687 44 209868 91 541182 41 1900 5 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 29126 100 +++++ +++ +++++ +++ 24761 100 +++++ +++ +++++ +++ cm-srfe03,32000M,230482,97,477644,76,223687,44,209868,91,541182,41,1899.7,5,16,29126,100,+++++,+++,+++++,+++,24761,100,+++++,+++,+++++,+++ I will run these with the r710 server as well and will report the results. Thanks for the help! -Greg On Wed, Aug 10, 2011 at 9:16 AM, phil.harman at gmail.com <phil.harman at gmail.com> wrote:> I would generally agree that dd is not a great benchmarking tool, but you > could use multiple instances to multiple files, and larger block sizes are > more efficient. And it''s always good to check iostat and mpstat for io and > cpu bottlenecks. Also note that an initial run that creates files may be > quicker because it just allocates blocks, whereas subsequent rewrites > require copy-on-write. > > ----- Reply message ----- > From: "Peter Tribble" <peter.tribble at gmail.com> > To: "Gregory Durham" <gregory.durham at gmail.com> > Cc: <zfs-discuss at opensolaris.org> > Subject: [zfs-discuss] Issues with supermicro > Date: Wed, Aug 10, 2011 10:56 > > > On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham > <gregory.durham at gmail.com> wrote: >> Hello, >> We just purchased two of the sc847e26-rjbod1 units to be used in a >> storage environment running Solaris 11 express. >> >> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS >> 9200-8e hba. We are not using failover/redundancy. Meaning that one >> port of the hba goes to the primary front backplane interface, and the >> other goes to the primary rear backplane interface. >> >> For testing, we have done the following: >> Installed 12 disks in the front, 0 in the back. >> Created a stripe of different numbers of disks. After each test, I >> destroy the underlying storage volume and create a new one. As you can >> see by the results, adding more disks, makes no difference to the >> performance. This should make a large difference from 4 disks to 8 >> disks, however no difference is shown. >> >> Any help would be greatly appreciated! >> >> This is the result: >> >> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero >> of=/fooPool0/86gb.tst bs=4096 count=20971520 >> ^C3503681+0 records in >> 3503681+0 records out >> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s > > So, the problem here is that you''re not testing the storage at all. > You''re basically measuring dd. > > To get meaningful results, you need to do two things: > > First, run it for long enough so you eliminate any write cache > effects. Writes go to memory and only get sent to disk in the > background. > > Second, use a proper benchmark suite, and one that isn''t itself > a bottleneck. Something like vdbench, although there are others. > > -- > -Peter Tribble > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
What sort of load will this server be serving? sync or async writes? what sort of reads? random i/o or sequential? if sequential, how many streams/concurrent users? those are factors you need to evaluate before running a test. A local test will usually be using async i/o and a dd with only 4k blocksize is bound to be slow, probably because of cpu overhead. roy ----- Original Message -----> Hello All, > Sorry for the lack of information. Here is some answers to some > questions: > 1) createPool.sh: > essentially can take 2 params, one is number of disks in pool, the > second is either blank or mirrored, blank means number of disks in the > pool i.e. raid 0, mirrored makes 2 disk mirrors. > > #!/bin/sh > disks=( `cat diskList | grep Hitachi | awk ''{print $2}'' | tr ''\n'' '' ''` > ) > #echo ${disks[1]} > #$useDisks=" " > for (( i = 0; i < $1; i++ )) > do > #echo "Thus far: "$useDisks > if [ "$2" = "mirrored" ] > then > if [ $(($i % 2)) -eq 0 ] > then > useDisks="$useDisks mirror ${disks[i]}" > else > useDisks=$useDisks" "${disks[i]} > fi > else > useDisks=$useDisks" "${disks[i]} > fi > > if [ $(($i - $1)) -le 2 ] > then > echo "spares are: ${disks[i]}" > fi > done > > #echo $useDisks > zpool create -f fooPool0 $useDisks > > > > 2) hardware: > Each server attached to each storage array is a dell r710 with 32 GB > memory each. To test for issues with another platform the below info, > is from a dell 1950 server with 8GB memory. However, I see similar > results from the r710s as well. > > > 3) In order to deal with caching, I am writing larger amounts of data > to the disk then I have memory for. > > 4) I have tested with bonnie++ as well and here are the results, i > have read that it is best to test with 4x the amount of memory: > /usr/local/sbin/bonnie++ -s 32000 -d /fooPool0/test -u gdurham > Using uid:101, gid:10. > Writing with putc()...done > Writing intelligently...done > Rewriting...done > Reading with getc()...done > Reading intelligently...done > start ''em...done...done...done... > Create files in sequential order...done. > Stat files in sequential order...done. > Delete files in sequential order...done. > Create files in random order...done. > Stat files in random order...done. > Delete files in random order...done. > Version 1.03d ------Sequential Output------ --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > %CP > cm-srfe03 32000M 230482 97 477644 76 223687 44 209868 91 541182 > 41 1900 5 > ------Sequential Create------ --------Random Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 29126 100 +++++ +++ +++++ +++ 24761 100 +++++ +++ +++++ +++ > cm-srfe03,32000M,230482,97,477644,76,223687,44,209868,91,541182,41,1899.7,5,16,29126,100,+++++,+++,+++++,+++,24761,100,+++++,+++,+++++,+++ > > > I will run these with the r710 server as well and will report the > results. > > Thanks for the help! > > -Greg > > > > On Wed, Aug 10, 2011 at 9:16 AM, phil.harman at gmail.com > <phil.harman at gmail.com> wrote: > > I would generally agree that dd is not a great benchmarking tool, > > but you > > could use multiple instances to multiple files, and larger block > > sizes are > > more efficient. And it''s always good to check iostat and mpstat for > > io and > > cpu bottlenecks. Also note that an initial run that creates files > > may be > > quicker because it just allocates blocks, whereas subsequent > > rewrites > > require copy-on-write. > > > > ----- Reply message ----- > > From: "Peter Tribble" <peter.tribble at gmail.com> > > To: "Gregory Durham" <gregory.durham at gmail.com> > > Cc: <zfs-discuss at opensolaris.org> > > Subject: [zfs-discuss] Issues with supermicro > > Date: Wed, Aug 10, 2011 10:56 > > > > > > On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham > > <gregory.durham at gmail.com> wrote: > >> Hello, > >> We just purchased two of the sc847e26-rjbod1 units to be used in a > >> storage environment running Solaris 11 express. > >> > >> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS > >> 9200-8e hba. We are not using failover/redundancy. Meaning that one > >> port of the hba goes to the primary front backplane interface, and > >> the > >> other goes to the primary rear backplane interface. > >> > >> For testing, we have done the following: > >> Installed 12 disks in the front, 0 in the back. > >> Created a stripe of different numbers of disks. After each test, I > >> destroy the underlying storage volume and create a new one. As you > >> can > >> see by the results, adding more disks, makes no difference to the > >> performance. This should make a large difference from 4 disks to 8 > >> disks, however no difference is shown. > >> > >> Any help would be greatly appreciated! > >> > >> This is the result: > >> > >> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero > >> of=/fooPool0/86gb.tst bs=4096 count=20971520 > >> ^C3503681+0 records in > >> 3503681+0 records out > >> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s > > > > So, the problem here is that you''re not testing the storage at all. > > You''re basically measuring dd. > > > > To get meaningful results, you need to do two things: > > > > First, run it for long enough so you eliminate any write cache > > effects. Writes go to memory and only get sent to disk in the > > background. > > > > Second, use a proper benchmark suite, and one that isn''t itself > > a bottleneck. Something like vdbench, although there are others. > > > > -- > > -Peter Tribble > > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
This system is for serving VM images through iSCSI to roughly 30 xenserver hosts. I would like to know what type of performance I can expect in the coming months as we grow this system out. We currently have 2 intel ssds mirrored for the zil and 2 intel ssds for the l2arc in a stripe. I am interested more in max throughput of the local storage at this point and time. On Wed, Aug 10, 2011 at 12:01 PM, Roy Sigurd Karlsbakk <roy at karlsbakk.net> wrote:> What sort of load will this server be serving? sync or async writes? what sort of reads? random i/o or sequential? if sequential, how many streams/concurrent users? those are factors you need to evaluate before running a test. A local test will usually be using async i/o and a dd with only 4k blocksize is bound to be slow, probably because of cpu overhead. > > roy > > ----- Original Message ----- >> Hello All, >> Sorry for the lack of information. Here is some answers to some >> questions: >> 1) createPool.sh: >> essentially can take 2 params, one is number of disks in pool, the >> second is either blank or mirrored, blank means number of disks in the >> pool i.e. raid 0, mirrored makes 2 disk mirrors. >> >> #!/bin/sh >> disks=( `cat diskList | grep Hitachi | awk ''{print $2}'' | tr ''\n'' '' ''` >> ) >> #echo ${disks[1]} >> #$useDisks=" " >> for (( i = 0; i < $1; i++ )) >> do >> #echo "Thus far: "$useDisks >> if [ "$2" = "mirrored" ] >> then >> if [ $(($i % 2)) -eq 0 ] >> then >> useDisks="$useDisks mirror ${disks[i]}" >> else >> useDisks=$useDisks" "${disks[i]} >> fi >> else >> useDisks=$useDisks" "${disks[i]} >> fi >> >> if [ $(($i - $1)) -le 2 ] >> then >> echo "spares are: ${disks[i]}" >> fi >> done >> >> #echo $useDisks >> zpool create -f fooPool0 $useDisks >> >> >> >> 2) hardware: >> Each server attached to each storage array is a dell r710 with 32 GB >> memory each. To test for issues with another platform the below info, >> is from a dell 1950 server with 8GB memory. However, I see similar >> results from the r710s as well. >> >> >> 3) In order to deal with caching, I am writing larger amounts of data >> to the disk then I have memory for. >> >> 4) I have tested with bonnie++ as well and here are the results, i >> have read that it is best to test with 4x the amount of memory: >> /usr/local/sbin/bonnie++ -s 32000 -d /fooPool0/test -u gdurham >> Using uid:101, gid:10. >> Writing with putc()...done >> Writing intelligently...done >> Rewriting...done >> Reading with getc()...done >> Reading intelligently...done >> start ''em...done...done...done... >> Create files in sequential order...done. >> Stat files in sequential order...done. >> Delete files in sequential order...done. >> Create files in random order...done. >> Stat files in random order...done. >> Delete files in random order...done. >> Version 1.03d ------Sequential Output------ --Sequential Input- >> --Random- >> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec >> %CP >> cm-srfe03 32000M 230482 97 477644 76 223687 44 209868 91 541182 >> 41 1900 5 >> ------Sequential Create------ --------Random Create-------- >> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> 16 29126 100 +++++ +++ +++++ +++ 24761 100 +++++ +++ +++++ +++ >> cm-srfe03,32000M,230482,97,477644,76,223687,44,209868,91,541182,41,1899.7,5,16,29126,100,+++++,+++,+++++,+++,24761,100,+++++,+++,+++++,+++ >> >> >> I will run these with the r710 server as well and will report the >> results. >> >> Thanks for the help! >> >> -Greg >> >> >> >> On Wed, Aug 10, 2011 at 9:16 AM, phil.harman at gmail.com >> <phil.harman at gmail.com> wrote: >> > I would generally agree that dd is not a great benchmarking tool, >> > but you >> > could use multiple instances to multiple files, and larger block >> > sizes are >> > more efficient. And it''s always good to check iostat and mpstat for >> > io and >> > cpu bottlenecks. Also note that an initial run that creates files >> > may be >> > quicker because it just allocates blocks, whereas subsequent >> > rewrites >> > require copy-on-write. >> > >> > ----- Reply message ----- >> > From: "Peter Tribble" <peter.tribble at gmail.com> >> > To: "Gregory Durham" <gregory.durham at gmail.com> >> > Cc: <zfs-discuss at opensolaris.org> >> > Subject: [zfs-discuss] Issues with supermicro >> > Date: Wed, Aug 10, 2011 10:56 >> > >> > >> > On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham >> > <gregory.durham at gmail.com> wrote: >> >> Hello, >> >> We just purchased two of the sc847e26-rjbod1 units to be used in a >> >> storage environment running Solaris 11 express. >> >> >> >> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS >> >> 9200-8e hba. We are not using failover/redundancy. Meaning that one >> >> port of the hba goes to the primary front backplane interface, and >> >> the >> >> other goes to the primary rear backplane interface. >> >> >> >> For testing, we have done the following: >> >> Installed 12 disks in the front, 0 in the back. >> >> Created a stripe of different numbers of disks. After each test, I >> >> destroy the underlying storage volume and create a new one. As you >> >> can >> >> see by the results, adding more disks, makes no difference to the >> >> performance. This should make a large difference from 4 disks to 8 >> >> disks, however no difference is shown. >> >> >> >> Any help would be greatly appreciated! >> >> >> >> This is the result: >> >> >> >> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero >> >> of=/fooPool0/86gb.tst bs=4096 count=20971520 >> >> ^C3503681+0 records in >> >> 3503681+0 records out >> >> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s >> > >> > So, the problem here is that you''re not testing the storage at all. >> > You''re basically measuring dd. >> > >> > To get meaningful results, you need to do two things: >> > >> > First, run it for long enough so you eliminate any write cache >> > effects. Writes go to memory and only get sent to disk in the >> > background. >> > >> > Second, use a proper benchmark suite, and one that isn''t itself >> > a bottleneck. Something like vdbench, although there are others. >> > >> > -- >> > -Peter Tribble >> > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ >> > _______________________________________________ >> > zfs-discuss mailing list >> > zfs-discuss at opensolaris.org >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Vennlige hilsener / Best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 97542685 > roy at karlsbakk.net > http://blogg.karlsbakk.net/ > -- > I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk. >
then create a ZVOL and share it over iSCSI and from the initiator host, run some benchmarks. You''ll never get good results from local tests. For that sort of load, I''d guess a stripe of mirrors should be good. RAIDzN will probably be rather bad roy ----- Original Message -----> This system is for serving VM images through iSCSI to roughly 30 > xenserver hosts. I would like to know what type of performance I can > expect in the coming months as we grow this system out. We currently > have 2 intel ssds mirrored for the zil and 2 intel ssds for the l2arc > in a stripe. I am interested more in max throughput of the local > storage at this point and time. > > On Wed, Aug 10, 2011 at 12:01 PM, Roy Sigurd Karlsbakk > <roy at karlsbakk.net> wrote: > > What sort of load will this server be serving? sync or async writes? > > what sort of reads? random i/o or sequential? if sequential, how > > many streams/concurrent users? those are factors you need to > > evaluate before running a test. A local test will usually be using > > async i/o and a dd with only 4k blocksize is bound to be slow, > > probably because of cpu overhead. > > > > roy > > > > ----- Original Message ----- > >> Hello All, > >> Sorry for the lack of information. Here is some answers to some > >> questions: > >> 1) createPool.sh: > >> essentially can take 2 params, one is number of disks in pool, the > >> second is either blank or mirrored, blank means number of disks in > >> the > >> pool i.e. raid 0, mirrored makes 2 disk mirrors. > >> > >> #!/bin/sh > >> disks=( `cat diskList | grep Hitachi | awk ''{print $2}'' | tr ''\n'' '' > >> ''` > >> ) > >> #echo ${disks[1]} > >> #$useDisks=" " > >> for (( i = 0; i < $1; i++ )) > >> do > >> #echo "Thus far: "$useDisks > >> if [ "$2" = "mirrored" ] > >> then > >> if [ $(($i % 2)) -eq 0 ] > >> then > >> useDisks="$useDisks mirror ${disks[i]}" > >> else > >> useDisks=$useDisks" "${disks[i]} > >> fi > >> else > >> useDisks=$useDisks" "${disks[i]} > >> fi > >> > >> if [ $(($i - $1)) -le 2 ] > >> then > >> echo "spares are: ${disks[i]}" > >> fi > >> done > >> > >> #echo $useDisks > >> zpool create -f fooPool0 $useDisks > >> > >> > >> > >> 2) hardware: > >> Each server attached to each storage array is a dell r710 with 32 > >> GB > >> memory each. To test for issues with another platform the below > >> info, > >> is from a dell 1950 server with 8GB memory. However, I see similar > >> results from the r710s as well. > >> > >> > >> 3) In order to deal with caching, I am writing larger amounts of > >> data > >> to the disk then I have memory for. > >> > >> 4) I have tested with bonnie++ as well and here are the results, i > >> have read that it is best to test with 4x the amount of memory: > >> /usr/local/sbin/bonnie++ -s 32000 -d /fooPool0/test -u gdurham > >> Using uid:101, gid:10. > >> Writing with putc()...done > >> Writing intelligently...done > >> Rewriting...done > >> Reading with getc()...done > >> Reading intelligently...done > >> start ''em...done...done...done... > >> Create files in sequential order...done. > >> Stat files in sequential order...done. > >> Delete files in sequential order...done. > >> Create files in random order...done. > >> Stat files in random order...done. > >> Delete files in random order...done. > >> Version 1.03d ------Sequential Output------ --Sequential Input- > >> --Random- > >> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > >> %CP > >> cm-srfe03 32000M 230482 97 477644 76 223687 44 209868 91 541182 > >> 41 1900 5 > >> ------Sequential Create------ --------Random Create-------- > >> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > >> 16 29126 100 +++++ +++ +++++ +++ 24761 100 +++++ +++ +++++ +++ > >> cm-srfe03,32000M,230482,97,477644,76,223687,44,209868,91,541182,41,1899.7,5,16,29126,100,+++++,+++,+++++,+++,24761,100,+++++,+++,+++++,+++ > >> > >> > >> I will run these with the r710 server as well and will report the > >> results. > >> > >> Thanks for the help! > >> > >> -Greg > >> > >> > >> > >> On Wed, Aug 10, 2011 at 9:16 AM, phil.harman at gmail.com > >> <phil.harman at gmail.com> wrote: > >> > I would generally agree that dd is not a great benchmarking tool, > >> > but you > >> > could use multiple instances to multiple files, and larger block > >> > sizes are > >> > more efficient. And it''s always good to check iostat and mpstat > >> > for > >> > io and > >> > cpu bottlenecks. Also note that an initial run that creates files > >> > may be > >> > quicker because it just allocates blocks, whereas subsequent > >> > rewrites > >> > require copy-on-write. > >> > > >> > ----- Reply message ----- > >> > From: "Peter Tribble" <peter.tribble at gmail.com> > >> > To: "Gregory Durham" <gregory.durham at gmail.com> > >> > Cc: <zfs-discuss at opensolaris.org> > >> > Subject: [zfs-discuss] Issues with supermicro > >> > Date: Wed, Aug 10, 2011 10:56 > >> > > >> > > >> > On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham > >> > <gregory.durham at gmail.com> wrote: > >> >> Hello, > >> >> We just purchased two of the sc847e26-rjbod1 units to be used in > >> >> a > >> >> storage environment running Solaris 11 express. > >> >> > >> >> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI > >> >> SAS > >> >> 9200-8e hba. We are not using failover/redundancy. Meaning that > >> >> one > >> >> port of the hba goes to the primary front backplane interface, > >> >> and > >> >> the > >> >> other goes to the primary rear backplane interface. > >> >> > >> >> For testing, we have done the following: > >> >> Installed 12 disks in the front, 0 in the back. > >> >> Created a stripe of different numbers of disks. After each test, > >> >> I > >> >> destroy the underlying storage volume and create a new one. As > >> >> you > >> >> can > >> >> see by the results, adding more disks, makes no difference to > >> >> the > >> >> performance. This should make a large difference from 4 disks to > >> >> 8 > >> >> disks, however no difference is shown. > >> >> > >> >> Any help would be greatly appreciated! > >> >> > >> >> This is the result: > >> >> > >> >> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero > >> >> of=/fooPool0/86gb.tst bs=4096 count=20971520 > >> >> ^C3503681+0 records in > >> >> 3503681+0 records out > >> >> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s > >> > > >> > So, the problem here is that you''re not testing the storage at > >> > all. > >> > You''re basically measuring dd. > >> > > >> > To get meaningful results, you need to do two things: > >> > > >> > First, run it for long enough so you eliminate any write cache > >> > effects. Writes go to memory and only get sent to disk in the > >> > background. > >> > > >> > Second, use a proper benchmark suite, and one that isn''t itself > >> > a bottleneck. Something like vdbench, although there are others. > >> > > >> > -- > >> > -Peter Tribble > >> > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ > >> > _______________________________________________ > >> > zfs-discuss mailing list > >> > zfs-discuss at opensolaris.org > >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >> > > >> _______________________________________________ > >> zfs-discuss mailing list > >> zfs-discuss at opensolaris.org > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > -- > > Vennlige hilsener / Best regards > > > > roy > > -- > > Roy Sigurd Karlsbakk > > (+47) 97542685 > > roy at karlsbakk.net > > http://blogg.karlsbakk.net/ > > -- > > I all pedagogikk er det essensielt at pensum presenteres > > intelligibelt. Det er et element?rt imperativ for alle pedagoger ? > > unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de > > fleste tilfeller eksisterer adekvate og relevante synonymer p? > > norsk. > >-- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On Wed, Aug 10, 2011 at 2:55 PM, Gregory Durham <gregory.durham at gmail.com> wrote:> 3) In order to deal with caching, I am writing larger amounts of data > to the disk then I have memory for.The other trick is to limit the ARC to a much smaller value and then you can test with sane amounts of data. Add the following to /etc/system and reboot: set zfs:zfs_arc_max = <bytes> <bytes> can be decimal or hex (but don''t use a scale like 4g). Best to keep it a power of 2. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Designer: Frankenstein, A New Musical (http://www.facebook.com/event.php?eid=123170297765140) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players