thr3ads.net - zfs discuss - [zfs-discuss] Issues with supermicro [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Gregory Durham

2011-Aug-10 00:45 UTC

[zfs-discuss] Issues with supermicro

Hello,
We just purchased two of the sc847e26-rjbod1 units to be used in a
storage environment running Solaris 11 express.

We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS
9200-8e hba. We are not using failover/redundancy. Meaning that one
port of the hba goes to the primary front backplane interface, and the
other goes to the primary rear backplane interface.

For testing, we have done the following:
Installed 12 disks in the front, 0 in the back.
Created a stripe of different numbers of disks. After each test, I
destroy the underlying storage volume and create a new one. As you can
see by the results, adding more disks, makes no difference to the
performance. This should make a large difference from 4 disks to 8
disks, however no difference is shown.

Any help would be greatly appreciated!

This is the result:

root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0
root at cm-srfe03:/home/gdurham~# sh createPool.sh 4
spares are: c0t5000CCA223C00A25d0
spares are: c0t5000CCA223C00B2Fd0
spares are: c0t5000CCA223C00BA6d0
spares are: c0t5000CCA223C00BB7d0
root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
of=/fooPool0/86gb.tst bs=4096 count=20971520
^C3503681+0 records in
3503681+0 records out
14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s


real    0m39.396s
user    0m1.791s
sys     0m36.029s
root at cm-srfe03:/home/gdurham~#
root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0
root at cm-srfe03:/home/gdurham~# sh createPool.sh 6
spares are: c0t5000CCA223C00A25d0
spares are: c0t5000CCA223C00B2Fd0
spares are: c0t5000CCA223C00BA6d0
spares are: c0t5000CCA223C00BB7d0
spares are: c0t5000CCA223C02C22d0
spares are: c0t5000CCA223C009B9d0
root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
of=/fooPool0/86gb.tst bs=4096 count=20971520
^C2298711+0 records in
2298711+0 records out
9415520256 bytes (9.4 GB) copied, 25.813 s, 365 MB/s


real    0m25.817s
user    0m1.171s
sys     0m23.544s
root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0
root at cm-srfe03:/home/gdurham~# sh createPool.sh 8
spares are: c0t5000CCA223C00A25d0
spares are: c0t5000CCA223C00B2Fd0
spares are: c0t5000CCA223C00BA6d0
spares are: c0t5000CCA223C00BB7d0
spares are: c0t5000CCA223C02C22d0
spares are: c0t5000CCA223C009B9d0
spares are: c0t5000CCA223C012B5d0
spares are: c0t5000CCA223C029AFd0
root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
of=/fooPool0/86gb.tst bs=4096 count=20971520
^C6272342+0 records in
6272342+0 records out
25691512832 bytes (26 GB) copied, 70.4122 s, 365 MB/s


real    1m10.433s
user    0m3.187s
sys     1m4.426s

Bob Friesenhahn

2011-Aug-10 00:57 UTC

head link

[zfs-discuss] Issues with supermicro

On Tue, 9 Aug 2011, Gregory Durham wrote:
> Hello,
> We just purchased two of the sc847e26-rjbod1 units to be used in a
> storage environment running Solaris 11 express.
>
> root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0
> root at cm-srfe03:/home/gdurham~# sh createPool.sh 4
What is ''createPool.sh''?

You really have not told us anything useful since we have no idea what 
your mystery script might be doing.  All we can see is that something 
reports more spare disks as the argument is increased as if the 
argument is the number of spare disks to allocate.  For all we know, 
it is always using the same number of data disks.

You also failed to tell us how memory you have installed in the 
machine.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Paul Kraus

2011-Aug-10 01:35 UTC

head link

[zfs-discuss] Issues with supermicro

On Tue, Aug 9, 2011 at 8:45 PM, Gregory Durham <gregory.durham at
gmail.com> wrote:
> For testing, we have done the following:
> Installed 12 disks in the front, 0 in the back.
> Created a stripe of different numbers of disks.
So you are creating one zpool with one disk per vdev and varying the
number of vdevs (the number of vdevs == the number of disks), there is
NO redundancy ?

Do you have compression enabled ?
Do you have dedup enabled ?
I expect the answer to both of the above is no given the test data is
/dev/zero, although that would tend to be limited by your memory
bandwidth (and if this is a modern server I would expect _much_ higher
numbers if compression were on). What is the server hardware
configuration ?

You are testing sequential write access only, is this really what the
application will be doing ?
> After each test, I
> destroy the underlying storage volume and create a new one. As you can
> see by the results, adding more disks, makes no difference to the
> performance. This should make a large difference from 4 disks to 8
> disks, however no difference is shown.
Unless you are being limited by something else... What does `iostat
-xn 1` show during the test ? There should be periods of zero activity
and then huge peaks (as the transaction group is committed to disk).

You are using a 4KB test data block size, is that realistic ? My
experience is that ZFS performance with block sizes that small with
the default "suggested recordsize" of 128K is not very good, try
setting recordsize to 16K (zfs set recordsize=16k <poolname>) and see
if you get different results. Try using a different tool (iozone is OK
but the best I have found is filebench, but that takes a bit more to
get useful data out of) instead of dd. Try a different test data block
size.

See
https://spreadsheets.google.com/a/kraus-haus.org/spreadsheet/pub?hl=en_US&hl=en_US&key=0AtReWsGW-SB1dFB1cmw0QWNNd0RkR1ZnN0JEb2RsLXc&output=html
for my experience changing configurations. I did not bother changing
the total number of drives as that was already fixed by what we
bought.
> Any help would be greatly appreciated!
>
> This is the result:
>
> root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0
> root at cm-srfe03:/home/gdurham~# sh createPool.sh 4
> spares are: c0t5000CCA223C00A25d0
> spares are: c0t5000CCA223C00B2Fd0
> spares are: c0t5000CCA223C00BA6d0
> spares are: c0t5000CCA223C00BB7d0
> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
> of=/fooPool0/86gb.tst bs=4096 count=20971520
> ^C3503681+0 records in
> 3503681+0 records out
> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s
>
>
> real ? ?0m39.396s
> user ? ?0m1.791s
> sys ? ? 0m36.029s
> root at cm-srfe03:/home/gdurham~#
> root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0
> root at cm-srfe03:/home/gdurham~# sh createPool.sh 6
> spares are: c0t5000CCA223C00A25d0
> spares are: c0t5000CCA223C00B2Fd0
> spares are: c0t5000CCA223C00BA6d0
> spares are: c0t5000CCA223C00BB7d0
> spares are: c0t5000CCA223C02C22d0
> spares are: c0t5000CCA223C009B9d0
> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
> of=/fooPool0/86gb.tst bs=4096 count=20971520
> ^C2298711+0 records in
> 2298711+0 records out
> 9415520256?bytes (9.4 GB) copied, 25.813 s, 365 MB/s
>
>
> real ? ?0m25.817s
> user ? ?0m1.171s
> sys ? ? 0m23.544s
> root at cm-srfe03:/home/gdurham~# zpool destroy fooPool0
> root at cm-srfe03:/home/gdurham~# sh createPool.sh 8
> spares are: c0t5000CCA223C00A25d0
> spares are: c0t5000CCA223C00B2Fd0
> spares are: c0t5000CCA223C00BA6d0
> spares are: c0t5000CCA223C00BB7d0
> spares are: c0t5000CCA223C02C22d0
> spares are: c0t5000CCA223C009B9d0
> spares are: c0t5000CCA223C012B5d0
> spares are: c0t5000CCA223C029AFd0
> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
> of=/fooPool0/86gb.tst bs=4096 count=20971520
> ^C6272342+0 records in
> 6272342+0 records out
> 25691512832 bytes (26 GB) copied, 70.4122 s, 365 MB/s
>
>
> real ? ?1m10.433s
> user ? ?0m3.187s
> sys ? ? 1m4.426s
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Designer: Frankenstein, A New Musical
(http://www.facebook.com/event.php?eid=123170297765140)
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Peter Tribble

2011-Aug-10 09:56 UTC

head link

[zfs-discuss] Issues with supermicro

On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham
<gregory.durham at gmail.com> wrote:> Hello,
> We just purchased two of the sc847e26-rjbod1 units to be used in a
> storage environment running Solaris 11 express.
>
> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS
> 9200-8e hba. We are not using failover/redundancy. Meaning that one
> port of the hba goes to the primary front backplane interface, and the
> other goes to the primary rear backplane interface.
>
> For testing, we have done the following:
> Installed 12 disks in the front, 0 in the back.
> Created a stripe of different numbers of disks. After each test, I
> destroy the underlying storage volume and create a new one. As you can
> see by the results, adding more disks, makes no difference to the
> performance. This should make a large difference from 4 disks to 8
> disks, however no difference is shown.
>
> Any help would be greatly appreciated!
>
> This is the result:
>
> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
> of=/fooPool0/86gb.tst bs=4096 count=20971520
> ^C3503681+0 records in
> 3503681+0 records out
> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s
So, the problem here is that you''re not testing the storage at all.
You''re basically measuring dd.

To get meaningful results, you need to do two things:

First, run it for long enough so you eliminate any write cache
effects. Writes go to memory and only get sent to disk in the
background.

Second, use a proper benchmark suite, and one that isn''t itself
a bottleneck. Something like vdbench, although there are others.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

phil.harman@gmail.com

2011-Aug-10 16:15 UTC

head link

[zfs-discuss] Issues with supermicro

I would generally agree that dd is not a great benchmarking tool, but you could
use multiple instances to multiple files, and larger block sizes are more
efficient. And it''s always good to check iostat and mpstat for io and
cpu bottlenecks. Also note that an initial run that creates files may be quicker
because it just allocates blocks, whereas subsequent rewrites require
copy-on-write.

----- Reply message -----
From: "Peter Tribble" <peter.tribble at gmail.com>
To: "Gregory Durham" <gregory.durham at gmail.com>
Cc: <zfs-discuss at opensolaris.org>
Subject: [zfs-discuss] Issues with supermicro
Date: Wed, Aug 10, 2011 10:56


On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham
<gregory.durham at gmail.com> wrote:> Hello,
> We just purchased two of the sc847e26-rjbod1 units to be used in a
> storage environment running Solaris 11 express.
>
> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS
> 9200-8e hba. We are not using failover/redundancy. Meaning that one
> port of the hba goes to the primary front backplane interface, and the
> other goes to the primary rear backplane interface.
>
> For testing, we have done the following:
> Installed 12 disks in the front, 0 in the back.
> Created a stripe of different numbers of disks. After each test, I
> destroy the underlying storage volume and create a new one. As you can
> see by the results, adding more disks, makes no difference to the
> performance. This should make a large difference from 4 disks to 8
> disks, however no difference is shown.
>
> Any help would be greatly appreciated!
>
> This is the result:
>
> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
> of=/fooPool0/86gb.tst bs=4096 count=20971520
> ^C3503681+0 records in
> 3503681+0 records out
> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s
So, the problem here is that you''re not testing the storage at all.
You''re basically measuring dd.

To get meaningful results, you need to do two things:

First, run it for long enough so you eliminate any write cache
effects. Writes go to memory and only get sent to disk in the
background.

Second, use a proper benchmark suite, and one that isn''t itself
a bottleneck. Something like vdbench, although there are others.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110810/7cf56071/attachment.html>

phil.harman@gmail.com

2011-Aug-10 16:16 UTC

head link

[zfs-discuss] Issues with supermicro

I would generally agree that dd is not a great benchmarking tool, but you could
use multiple instances to multiple files, and larger block sizes are more
efficient. And it''s always good to check iostat and mpstat for io and
cpu bottlenecks. Also note that an initial run that creates files may be quicker
because it just allocates blocks, whereas subsequent rewrites require
copy-on-write.

----- Reply message -----
From: "Peter Tribble" <peter.tribble at gmail.com>
To: "Gregory Durham" <gregory.durham at gmail.com>
Cc: <zfs-discuss at opensolaris.org>
Subject: [zfs-discuss] Issues with supermicro
Date: Wed, Aug 10, 2011 10:56


On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham
<gregory.durham at gmail.com> wrote:> Hello,
> We just purchased two of the sc847e26-rjbod1 units to be used in a
> storage environment running Solaris 11 express.
>
> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS
> 9200-8e hba. We are not using failover/redundancy. Meaning that one
> port of the hba goes to the primary front backplane interface, and the
> other goes to the primary rear backplane interface.
>
> For testing, we have done the following:
> Installed 12 disks in the front, 0 in the back.
> Created a stripe of different numbers of disks. After each test, I
> destroy the underlying storage volume and create a new one. As you can
> see by the results, adding more disks, makes no difference to the
> performance. This should make a large difference from 4 disks to 8
> disks, however no difference is shown.
>
> Any help would be greatly appreciated!
>
> This is the result:
>
> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
> of=/fooPool0/86gb.tst bs=4096 count=20971520
> ^C3503681+0 records in
> 3503681+0 records out
> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s
So, the problem here is that you''re not testing the storage at all.
You''re basically measuring dd.

To get meaningful results, you need to do two things:

First, run it for long enough so you eliminate any write cache
effects. Writes go to memory and only get sent to disk in the
background.

Second, use a proper benchmark suite, and one that isn''t itself
a bottleneck. Something like vdbench, although there are others.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110810/8f2f27d9/attachment-0001.html>

Gregory Durham

2011-Aug-10 18:55 UTC

head link

[zfs-discuss] Issues with supermicro

Hello All,
Sorry for the lack of information. Here is some answers to some questions:
1) createPool.sh:
essentially can take 2 params, one is number of disks in pool, the
second is either blank or mirrored, blank means number of disks in the
pool i.e. raid 0, mirrored makes 2 disk mirrors.

#!/bin/sh
disks=( `cat diskList | grep Hitachi | awk ''{print $2}'' | tr
''\n'' '' ''` )
#echo ${disks[1]}
#$useDisks=" "
for (( i = 0; i < $1; i++ ))
do
        #echo "Thus far: "$useDisks
        if [ "$2" = "mirrored" ]
        then
                if [ $(($i % 2)) -eq 0 ]
                then
                        useDisks="$useDisks mirror ${disks[i]}"
                else
                        useDisks=$useDisks" "${disks[i]}
                fi
        else
                useDisks=$useDisks" "${disks[i]}
        fi

        if [ $(($i - $1)) -le 2 ]
        then
                echo "spares are: ${disks[i]}"
        fi
done

#echo $useDisks
zpool create -f fooPool0 $useDisks



2) hardware:
Each server attached to each storage array is a dell r710 with 32 GB
memory each. To test for issues with another platform the below info,
is from a dell 1950 server with 8GB memory. However, I see similar
results from the r710s as well.


3) In order to deal with caching, I am writing larger amounts of data
to the disk then I have memory for.

4) I have tested with bonnie++ as well and here are the results, i
have read that it is best to test with 4x the amount of memory:
/usr/local/sbin/bonnie++ -s 32000 -d /fooPool0/test -u gdurham
Using uid:101, gid:10.
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start ''em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.03d       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cm-srfe03    32000M 230482  97 477644  76 223687  44 209868  91 541182
 41  1900   5
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 29126 100 +++++ +++ +++++ +++ 24761 100 +++++ +++ +++++ +++
cm-srfe03,32000M,230482,97,477644,76,223687,44,209868,91,541182,41,1899.7,5,16,29126,100,+++++,+++,+++++,+++,24761,100,+++++,+++,+++++,+++


I will run these with the r710 server as well and will report the results.

Thanks for the help!

-Greg



On Wed, Aug 10, 2011 at 9:16 AM, phil.harman at gmail.com
<phil.harman at gmail.com> wrote:> I would generally agree that dd is not a great benchmarking tool, but you
> could use multiple instances to multiple files, and larger block sizes are
> more efficient. And it''s always good to check iostat and mpstat
for io and
> cpu bottlenecks. Also note that an initial run that creates files may be
> quicker because it just allocates blocks, whereas subsequent rewrites
> require copy-on-write.
>
> ----- Reply message -----
> From: "Peter Tribble" <peter.tribble at gmail.com>
> To: "Gregory Durham" <gregory.durham at gmail.com>
> Cc: <zfs-discuss at opensolaris.org>
> Subject: [zfs-discuss] Issues with supermicro
> Date: Wed, Aug 10, 2011 10:56
>
>
> On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham
> <gregory.durham at gmail.com> wrote:
>> Hello,
>> We just purchased two of the sc847e26-rjbod1 units to be used in a
>> storage environment running Solaris 11 express.
>>
>> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS
>> 9200-8e hba. We are not using failover/redundancy. Meaning that one
>> port of the hba goes to the primary front backplane interface, and the
>> other goes to the primary rear backplane interface.
>>
>> For testing, we have done the following:
>> Installed 12 disks in the front, 0 in the back.
>> Created a stripe of different numbers of disks. After each test, I
>> destroy the underlying storage volume and create a new one. As you can
>> see by the results, adding more disks, makes no difference to the
>> performance. This should make a large difference from 4 disks to 8
>> disks, however no difference is shown.
>>
>> Any help would be greatly appreciated!
>>
>> This is the result:
>>
>> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
>> of=/fooPool0/86gb.tst bs=4096 count=20971520
>> ^C3503681+0 records in
>> 3503681+0 records out
>> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s
>
> So, the problem here is that you''re not testing the storage at
all.
> You''re basically measuring dd.
>
> To get meaningful results, you need to do two things:
>
> First, run it for long enough so you eliminate any write cache
> effects. Writes go to memory and only get sent to disk in the
> background.
>
> Second, use a proper benchmark suite, and one that isn''t itself
> a bottleneck. Something like vdbench, although there are others.
>
> --
> -Peter Tribble
> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Roy Sigurd Karlsbakk

2011-Aug-10 19:01 UTC

head link

[zfs-discuss] Issues with supermicro

What sort of load will this server be serving? sync or async writes? what sort
of reads? random i/o or sequential? if sequential, how many streams/concurrent
users? those are factors you need to evaluate before running a test. A local
test will usually be using async i/o and a dd with only 4k blocksize is bound to
be slow, probably because of cpu overhead.

roy

----- Original Message -----> Hello All,
> Sorry for the lack of information. Here is some answers to some
> questions:
> 1) createPool.sh:
> essentially can take 2 params, one is number of disks in pool, the
> second is either blank or mirrored, blank means number of disks in the
> pool i.e. raid 0, mirrored makes 2 disk mirrors.
> 
> #!/bin/sh
> disks=( `cat diskList | grep Hitachi | awk ''{print $2}'' |
tr ''\n'' '' ''`
> )
> #echo ${disks[1]}
> #$useDisks=" "
> for (( i = 0; i < $1; i++ ))
> do
> #echo "Thus far: "$useDisks
> if [ "$2" = "mirrored" ]
> then
> if [ $(($i % 2)) -eq 0 ]
> then
> useDisks="$useDisks mirror ${disks[i]}"
> else
> useDisks=$useDisks" "${disks[i]}
> fi
> else
> useDisks=$useDisks" "${disks[i]}
> fi
> 
> if [ $(($i - $1)) -le 2 ]
> then
> echo "spares are: ${disks[i]}"
> fi
> done
> 
> #echo $useDisks
> zpool create -f fooPool0 $useDisks
> 
> 
> 
> 2) hardware:
> Each server attached to each storage array is a dell r710 with 32 GB
> memory each. To test for issues with another platform the below info,
> is from a dell 1950 server with 8GB memory. However, I see similar
> results from the r710s as well.
> 
> 
> 3) In order to deal with caching, I am writing larger amounts of data
> to the disk then I have memory for.
> 
> 4) I have tested with bonnie++ as well and here are the results, i
> have read that it is best to test with 4x the amount of memory:
> /usr/local/sbin/bonnie++ -s 32000 -d /fooPool0/test -u gdurham
> Using uid:101, gid:10.
> Writing with putc()...done
> Writing intelligently...done
> Rewriting...done
> Reading with getc()...done
> Reading intelligently...done
> start ''em...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version 1.03d ------Sequential Output------ --Sequential Input-
> --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec
> %CP
> cm-srfe03 32000M 230482 97 477644 76 223687 44 209868 91 541182
> 41 1900 5
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 29126 100 +++++ +++ +++++ +++ 24761 100 +++++ +++ +++++ +++
>
cm-srfe03,32000M,230482,97,477644,76,223687,44,209868,91,541182,41,1899.7,5,16,29126,100,+++++,+++,+++++,+++,24761,100,+++++,+++,+++++,+++
> 
> 
> I will run these with the r710 server as well and will report the
> results.
> 
> Thanks for the help!
> 
> -Greg
> 
> 
> 
> On Wed, Aug 10, 2011 at 9:16 AM, phil.harman at gmail.com
> <phil.harman at gmail.com> wrote:
> > I would generally agree that dd is not a great benchmarking tool,
> > but you
> > could use multiple instances to multiple files, and larger block
> > sizes are
> > more efficient. And it''s always good to check iostat and
mpstat for
> > io and
> > cpu bottlenecks. Also note that an initial run that creates files
> > may be
> > quicker because it just allocates blocks, whereas subsequent
> > rewrites
> > require copy-on-write.
> >
> > ----- Reply message -----
> > From: "Peter Tribble" <peter.tribble at gmail.com>
> > To: "Gregory Durham" <gregory.durham at gmail.com>
> > Cc: <zfs-discuss at opensolaris.org>
> > Subject: [zfs-discuss] Issues with supermicro
> > Date: Wed, Aug 10, 2011 10:56
> >
> >
> > On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham
> > <gregory.durham at gmail.com> wrote:
> >> Hello,
> >> We just purchased two of the sc847e26-rjbod1 units to be used in a
> >> storage environment running Solaris 11 express.
> >>
> >> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI SAS
> >> 9200-8e hba. We are not using failover/redundancy. Meaning that
one
> >> port of the hba goes to the primary front backplane interface, and
> >> the
> >> other goes to the primary rear backplane interface.
> >>
> >> For testing, we have done the following:
> >> Installed 12 disks in the front, 0 in the back.
> >> Created a stripe of different numbers of disks. After each test, I
> >> destroy the underlying storage volume and create a new one. As you
> >> can
> >> see by the results, adding more disks, makes no difference to the
> >> performance. This should make a large difference from 4 disks to 8
> >> disks, however no difference is shown.
> >>
> >> Any help would be greatly appreciated!
> >>
> >> This is the result:
> >>
> >> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
> >> of=/fooPool0/86gb.tst bs=4096 count=20971520
> >> ^C3503681+0 records in
> >> 3503681+0 records out
> >> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s
> >
> > So, the problem here is that you''re not testing the storage
at all.
> > You''re basically measuring dd.
> >
> > To get meaningful results, you need to do two things:
> >
> > First, run it for long enough so you eliminate any write cache
> > effects. Writes go to memory and only get sent to disk in the
> > background.
> >
> > Second, use a proper benchmark suite, and one that isn''t
itself
> > a bottleneck. Something like vdbench, although there are others.
> >
> > --
> > -Peter Tribble
> > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Gregory Durham

2011-Aug-10 19:07 UTC

head link

[zfs-discuss] Issues with supermicro

This system is for serving VM images through iSCSI to roughly 30
xenserver hosts. I would like to know what type of performance I can
expect in the coming months as we grow this system out. We currently
have 2 intel ssds mirrored for the zil and 2 intel ssds for the l2arc
in a stripe. I am interested more in max throughput of the local
storage at this point and time.

On Wed, Aug 10, 2011 at 12:01 PM, Roy Sigurd Karlsbakk
<roy at karlsbakk.net> wrote:> What sort of load will this server be serving? sync or async writes? what
sort of reads? random i/o or sequential? if sequential, how many
streams/concurrent users? those are factors you need to evaluate before running
a test. A local test will usually be using async i/o and a dd with only 4k
blocksize is bound to be slow, probably because of cpu overhead.
>
> roy
>
> ----- Original Message -----
>> Hello All,
>> Sorry for the lack of information. Here is some answers to some
>> questions:
>> 1) createPool.sh:
>> essentially can take 2 params, one is number of disks in pool, the
>> second is either blank or mirrored, blank means number of disks in the
>> pool i.e. raid 0, mirrored makes 2 disk mirrors.
>>
>> #!/bin/sh
>> disks=( `cat diskList | grep Hitachi | awk ''{print
$2}'' | tr ''\n'' '' ''`
>> )
>> #echo ${disks[1]}
>> #$useDisks=" "
>> for (( i = 0; i < $1; i++ ))
>> do
>> #echo "Thus far: "$useDisks
>> if [ "$2" = "mirrored" ]
>> then
>> if [ $(($i % 2)) -eq 0 ]
>> then
>> useDisks="$useDisks mirror ${disks[i]}"
>> else
>> useDisks=$useDisks" "${disks[i]}
>> fi
>> else
>> useDisks=$useDisks" "${disks[i]}
>> fi
>>
>> if [ $(($i - $1)) -le 2 ]
>> then
>> echo "spares are: ${disks[i]}"
>> fi
>> done
>>
>> #echo $useDisks
>> zpool create -f fooPool0 $useDisks
>>
>>
>>
>> 2) hardware:
>> Each server attached to each storage array is a dell r710 with 32 GB
>> memory each. To test for issues with another platform the below info,
>> is from a dell 1950 server with 8GB memory. However, I see similar
>> results from the r710s as well.
>>
>>
>> 3) In order to deal with caching, I am writing larger amounts of data
>> to the disk then I have memory for.
>>
>> 4) I have tested with bonnie++ as well and here are the results, i
>> have read that it is best to test with 4x the amount of memory:
>> /usr/local/sbin/bonnie++ -s 32000 -d /fooPool0/test -u gdurham
>> Using uid:101, gid:10.
>> Writing with putc()...done
>> Writing intelligently...done
>> Rewriting...done
>> Reading with getc()...done
>> Reading intelligently...done
>> start ''em...done...done...done...
>> Create files in sequential order...done.
>> Stat files in sequential order...done.
>> Delete files in sequential order...done.
>> Create files in random order...done.
>> Stat files in random order...done.
>> Delete files in random order...done.
>> Version 1.03d ------Sequential Output------ --Sequential Input-
>> --Random-
>> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec
>> %CP
>> cm-srfe03 32000M 230482 97 477644 76 223687 44 209868 91 541182
>> 41 1900 5
>> ------Sequential Create------ --------Random Create--------
>> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> 16 29126 100 +++++ +++ +++++ +++ 24761 100 +++++ +++ +++++ +++
>>
cm-srfe03,32000M,230482,97,477644,76,223687,44,209868,91,541182,41,1899.7,5,16,29126,100,+++++,+++,+++++,+++,24761,100,+++++,+++,+++++,+++
>>
>>
>> I will run these with the r710 server as well and will report the
>> results.
>>
>> Thanks for the help!
>>
>> -Greg
>>
>>
>>
>> On Wed, Aug 10, 2011 at 9:16 AM, phil.harman at gmail.com
>> <phil.harman at gmail.com> wrote:
>> > I would generally agree that dd is not a great benchmarking tool,
>> > but you
>> > could use multiple instances to multiple files, and larger block
>> > sizes are
>> > more efficient. And it''s always good to check iostat and
mpstat for
>> > io and
>> > cpu bottlenecks. Also note that an initial run that creates files
>> > may be
>> > quicker because it just allocates blocks, whereas subsequent
>> > rewrites
>> > require copy-on-write.
>> >
>> > ----- Reply message -----
>> > From: "Peter Tribble" <peter.tribble at gmail.com>
>> > To: "Gregory Durham" <gregory.durham at gmail.com>
>> > Cc: <zfs-discuss at opensolaris.org>
>> > Subject: [zfs-discuss] Issues with supermicro
>> > Date: Wed, Aug 10, 2011 10:56
>> >
>> >
>> > On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham
>> > <gregory.durham at gmail.com> wrote:
>> >> Hello,
>> >> We just purchased two of the sc847e26-rjbod1 units to be used
in a
>> >> storage environment running Solaris 11 express.
>> >>
>> >> We are using Hitachi HUA723020ALA640 6 gb/s drives with an LSI
SAS
>> >> 9200-8e hba. We are not using failover/redundancy. Meaning
that one
>> >> port of the hba goes to the primary front backplane interface,
and
>> >> the
>> >> other goes to the primary rear backplane interface.
>> >>
>> >> For testing, we have done the following:
>> >> Installed 12 disks in the front, 0 in the back.
>> >> Created a stripe of different numbers of disks. After each
test, I
>> >> destroy the underlying storage volume and create a new one. As
you
>> >> can
>> >> see by the results, adding more disks, makes no difference to
the
>> >> performance. This should make a large difference from 4 disks
to 8
>> >> disks, however no difference is shown.
>> >>
>> >> Any help would be greatly appreciated!
>> >>
>> >> This is the result:
>> >>
>> >> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
>> >> of=/fooPool0/86gb.tst bs=4096 count=20971520
>> >> ^C3503681+0 records in
>> >> 3503681+0 records out
>> >> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s
>> >
>> > So, the problem here is that you''re not testing the
storage at all.
>> > You''re basically measuring dd.
>> >
>> > To get meaningful results, you need to do two things:
>> >
>> > First, run it for long enough so you eliminate any write cache
>> > effects. Writes go to memory and only get sent to disk in the
>> > background.
>> >
>> > Second, use a proper benchmark suite, and one that isn''t
itself
>> > a bottleneck. Something like vdbench, although there are others.
>> >
>> > --
>> > -Peter Tribble
>> > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>> > _______________________________________________
>> > zfs-discuss mailing list
>> > zfs-discuss at opensolaris.org
>> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>> >
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> Vennlige hilsener / Best regards
>
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> roy at karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det
er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.
>

Roy Sigurd Karlsbakk

2011-Aug-10 19:17 UTC

head link

[zfs-discuss] Issues with supermicro

then create a ZVOL and share it over iSCSI and from the initiator host, run some
benchmarks. You''ll never get good results from local tests. For that
sort of load, I''d guess a stripe of mirrors should be good. RAIDzN will
probably be rather bad

roy

----- Original Message -----> This system is for serving VM images through iSCSI to roughly 30
> xenserver hosts. I would like to know what type of performance I can
> expect in the coming months as we grow this system out. We currently
> have 2 intel ssds mirrored for the zil and 2 intel ssds for the l2arc
> in a stripe. I am interested more in max throughput of the local
> storage at this point and time.
> 
> On Wed, Aug 10, 2011 at 12:01 PM, Roy Sigurd Karlsbakk
> <roy at karlsbakk.net> wrote:
> > What sort of load will this server be serving? sync or async writes?
> > what sort of reads? random i/o or sequential? if sequential, how
> > many streams/concurrent users? those are factors you need to
> > evaluate before running a test. A local test will usually be using
> > async i/o and a dd with only 4k blocksize is bound to be slow,
> > probably because of cpu overhead.
> >
> > roy
> >
> > ----- Original Message -----
> >> Hello All,
> >> Sorry for the lack of information. Here is some answers to some
> >> questions:
> >> 1) createPool.sh:
> >> essentially can take 2 params, one is number of disks in pool, the
> >> second is either blank or mirrored, blank means number of disks in
> >> the
> >> pool i.e. raid 0, mirrored makes 2 disk mirrors.
> >>
> >> #!/bin/sh
> >> disks=( `cat diskList | grep Hitachi | awk ''{print
$2}'' | tr ''\n'' ''
> >> ''`
> >> )
> >> #echo ${disks[1]}
> >> #$useDisks=" "
> >> for (( i = 0; i < $1; i++ ))
> >> do
> >> #echo "Thus far: "$useDisks
> >> if [ "$2" = "mirrored" ]
> >> then
> >> if [ $(($i % 2)) -eq 0 ]
> >> then
> >> useDisks="$useDisks mirror ${disks[i]}"
> >> else
> >> useDisks=$useDisks" "${disks[i]}
> >> fi
> >> else
> >> useDisks=$useDisks" "${disks[i]}
> >> fi
> >>
> >> if [ $(($i - $1)) -le 2 ]
> >> then
> >> echo "spares are: ${disks[i]}"
> >> fi
> >> done
> >>
> >> #echo $useDisks
> >> zpool create -f fooPool0 $useDisks
> >>
> >>
> >>
> >> 2) hardware:
> >> Each server attached to each storage array is a dell r710 with 32
> >> GB
> >> memory each. To test for issues with another platform the below
> >> info,
> >> is from a dell 1950 server with 8GB memory. However, I see similar
> >> results from the r710s as well.
> >>
> >>
> >> 3) In order to deal with caching, I am writing larger amounts of
> >> data
> >> to the disk then I have memory for.
> >>
> >> 4) I have tested with bonnie++ as well and here are the results, i
> >> have read that it is best to test with 4x the amount of memory:
> >> /usr/local/sbin/bonnie++ -s 32000 -d /fooPool0/test -u gdurham
> >> Using uid:101, gid:10.
> >> Writing with putc()...done
> >> Writing intelligently...done
> >> Rewriting...done
> >> Reading with getc()...done
> >> Reading intelligently...done
> >> start ''em...done...done...done...
> >> Create files in sequential order...done.
> >> Stat files in sequential order...done.
> >> Delete files in sequential order...done.
> >> Create files in random order...done.
> >> Stat files in random order...done.
> >> Delete files in random order...done.
> >> Version 1.03d ------Sequential Output------ --Sequential Input-
> >> --Random-
> >> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec
> >> %CP
> >> cm-srfe03 32000M 230482 97 477644 76 223687 44 209868 91 541182
> >> 41 1900 5
> >> ------Sequential Create------ --------Random Create--------
> >> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> >> 16 29126 100 +++++ +++ +++++ +++ 24761 100 +++++ +++ +++++ +++
> >>
cm-srfe03,32000M,230482,97,477644,76,223687,44,209868,91,541182,41,1899.7,5,16,29126,100,+++++,+++,+++++,+++,24761,100,+++++,+++,+++++,+++
> >>
> >>
> >> I will run these with the r710 server as well and will report the
> >> results.
> >>
> >> Thanks for the help!
> >>
> >> -Greg
> >>
> >>
> >>
> >> On Wed, Aug 10, 2011 at 9:16 AM, phil.harman at gmail.com
> >> <phil.harman at gmail.com> wrote:
> >> > I would generally agree that dd is not a great benchmarking
tool,
> >> > but you
> >> > could use multiple instances to multiple files, and larger
block
> >> > sizes are
> >> > more efficient. And it''s always good to check iostat
and mpstat
> >> > for
> >> > io and
> >> > cpu bottlenecks. Also note that an initial run that creates
files
> >> > may be
> >> > quicker because it just allocates blocks, whereas subsequent
> >> > rewrites
> >> > require copy-on-write.
> >> >
> >> > ----- Reply message -----
> >> > From: "Peter Tribble" <peter.tribble at
gmail.com>
> >> > To: "Gregory Durham" <gregory.durham at
gmail.com>
> >> > Cc: <zfs-discuss at opensolaris.org>
> >> > Subject: [zfs-discuss] Issues with supermicro
> >> > Date: Wed, Aug 10, 2011 10:56
> >> >
> >> >
> >> > On Wed, Aug 10, 2011 at 1:45 AM, Gregory Durham
> >> > <gregory.durham at gmail.com> wrote:
> >> >> Hello,
> >> >> We just purchased two of the sc847e26-rjbod1 units to be
used in
> >> >> a
> >> >> storage environment running Solaris 11 express.
> >> >>
> >> >> We are using Hitachi HUA723020ALA640 6 gb/s drives with
an LSI
> >> >> SAS
> >> >> 9200-8e hba. We are not using failover/redundancy.
Meaning that
> >> >> one
> >> >> port of the hba goes to the primary front backplane
interface,
> >> >> and
> >> >> the
> >> >> other goes to the primary rear backplane interface.
> >> >>
> >> >> For testing, we have done the following:
> >> >> Installed 12 disks in the front, 0 in the back.
> >> >> Created a stripe of different numbers of disks. After
each test,
> >> >> I
> >> >> destroy the underlying storage volume and create a new
one. As
> >> >> you
> >> >> can
> >> >> see by the results, adding more disks, makes no
difference to
> >> >> the
> >> >> performance. This should make a large difference from 4
disks to
> >> >> 8
> >> >> disks, however no difference is shown.
> >> >>
> >> >> Any help would be greatly appreciated!
> >> >>
> >> >> This is the result:
> >> >>
> >> >> root at cm-srfe03:/home/gdurham~# time dd if=/dev/zero
> >> >> of=/fooPool0/86gb.tst bs=4096 count=20971520
> >> >> ^C3503681+0 records in
> >> >> 3503681+0 records out
> >> >> 14351077376 bytes (14 GB) copied, 39.3747 s, 364 MB/s
> >> >
> >> > So, the problem here is that you''re not testing the
storage at
> >> > all.
> >> > You''re basically measuring dd.
> >> >
> >> > To get meaningful results, you need to do two things:
> >> >
> >> > First, run it for long enough so you eliminate any write
cache
> >> > effects. Writes go to memory and only get sent to disk in the
> >> > background.
> >> >
> >> > Second, use a proper benchmark suite, and one that
isn''t itself
> >> > a bottleneck. Something like vdbench, although there are
others.
> >> >
> >> > --
> >> > -Peter Tribble
> >> > http://www.petertribble.co.uk/ -
http://ptribble.blogspot.com/
> >> > _______________________________________________
> >> > zfs-discuss mailing list
> >> > zfs-discuss at opensolaris.org
> >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >> >
> >> _______________________________________________
> >> zfs-discuss mailing list
> >> zfs-discuss at opensolaris.org
> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> > --
> > Vennlige hilsener / Best regards
> >
> > roy
> > --
> > Roy Sigurd Karlsbakk
> > (+47) 97542685
> > roy at karlsbakk.net
> > http://blogg.karlsbakk.net/
> > --
> > I all pedagogikk er det essensielt at pensum presenteres
> > intelligibelt. Det er et element?rt imperativ for alle pedagoger ?
> > unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de
> > fleste tilfeller eksisterer adekvate og relevante synonymer p?
> > norsk.
> >
-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Paul Kraus

2011-Aug-10 19:54 UTC

head link

[zfs-discuss] Issues with supermicro

On Wed, Aug 10, 2011 at 2:55 PM, Gregory Durham
<gregory.durham at gmail.com> wrote:
> 3) In order to deal with caching, I am writing larger amounts of data
> to the disk then I have memory for.
The other trick is to limit the ARC to a much smaller value and then
you can test with sane amounts of data.

Add the following to /etc/system and reboot:

set zfs:zfs_arc_max = <bytes>

<bytes> can be decimal or hex (but don''t use a scale like 4g).
Best to
keep it a power of 2.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Designer: Frankenstein, A New Musical
(http://www.facebook.com/event.php?eid=123170297765140)
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

zfs discuss - Aug 2011 - Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro

[zfs-discuss] Issues with supermicro