thr3ads.net - zfs discuss - [zfs-discuss] Performance with Sun StorageTek 2540 [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Bob Friesenhahn

2008-Feb-15 00:30 UTC

[zfs-discuss] Performance with Sun StorageTek 2540

Under Solaris 10 on a 4 core Sun Ultra 40 with 20GB RAM, I am setting 
up a Sun StorageTek 2540 with 12 300GB 15K RPM SAS drives and 
connected via load-shared 4Gbit FC links.  This week I have tried many 
different configurations, using firmware managed RAID, ZFS managed 
RAID, and with the controller cache enabled or disabled.

My objective is to obtain the best single-file write performance. 
Unfortunately, I am hitting some sort of write bottleneck and I am not 
sure how to solve it.  I was hoping for a write speed of 300MB/second. 
With ZFS on top of a firmware managed RAID 0 across all 12 drives, I 
hit a peak of 200MB/second.  With each drive exported as a LUN and a 
ZFS pool of 6 pairs, I see a write rate of 154MB/second.  The number 
of drives used has not had much effect on write rate.

Information on my pool is shown at the end of this email.

I am driving the writes using ''iozone'' since
''filebench'' does not seem
to want to install/work on Solaris 10.

I am suspecting that the problem is that I am running out of IOPS 
since the drive array indicates a an average IOPS of 214 for one drive 
even though the peak write speed is only 26MB/second (peak read is 
42MB/second).

Can someone share with me what they think the write bottleneck might
be and how I can surmount it?

Thanks,

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

% zpool status
   pool: Sun_2540
  state: ONLINE
  scrub: none requested
config:

         NAME                                       STATE     READ WRITE CKSUM
         Sun_2540                                   ONLINE       0     0     0
           mirror                                   ONLINE       0     0     0
             c4t600A0B80003A8A0B0000096A47B4559Ed0  ONLINE       0     0     0
             c4t600A0B80003A8A0B0000096E47B456DAd0  ONLINE       0     0     0
           mirror                                   ONLINE       0     0     0
             c4t600A0B80003A8A0B0000096147B451BEd0  ONLINE       0     0     0
             c4t600A0B80003A8A0B0000096647B453CEd0  ONLINE       0     0     0
           mirror                                   ONLINE       0     0     0
             c4t600A0B80003A8A0B0000097347B457D4d0  ONLINE       0     0     0
             c4t600A0B800039C9B500000A9C47B4522Dd0  ONLINE       0     0     0
           mirror                                   ONLINE       0     0     0
             c4t600A0B800039C9B500000AA047B4529Bd0  ONLINE       0     0     0
             c4t600A0B800039C9B500000AA447B4544Fd0  ONLINE       0     0     0
           mirror                                   ONLINE       0     0     0
             c4t600A0B800039C9B500000AA847B45605d0  ONLINE       0     0     0
             c4t600A0B800039C9B500000AAC47B45739d0  ONLINE       0     0     0
           mirror                                   ONLINE       0     0     0
             c4t600A0B800039C9B500000AB047B457ADd0  ONLINE       0     0     0
             c4t600A0B800039C9B500000AB447B4595Fd0  ONLINE       0     0     0

errors: No known data errors
freddy:~% zpool iostat
                capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
Sun_2540    64.0G  1.57T    808    861  99.8M   105M
freddy:~% zpool iostat -v
                                            capacity     operations    bandwidth
pool                                     used  avail   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
Sun_2540                                64.0G  1.57T    809    860   100M   105M
   mirror                                10.7G   267G    135    143  16.7M 
17.6M
     c4t600A0B80003A8A0B0000096A47B4559Ed0      -      -     66    141  8.37M 
17.6M
     c4t600A0B80003A8A0B0000096E47B456DAd0      -      -     67    141  8.37M 
17.6M
   mirror                                10.7G   267G    135    143  16.7M 
17.6M
     c4t600A0B80003A8A0B0000096147B451BEd0      -      -     66    141  8.37M 
17.6M
     c4t600A0B80003A8A0B0000096647B453CEd0      -      -     66    141  8.37M 
17.6M
   mirror                                10.7G   267G    134    143  16.7M 
17.6M
     c4t600A0B80003A8A0B0000097347B457D4d0      -      -     66    141  8.34M 
17.6M
     c4t600A0B800039C9B500000A9C47B4522Dd0      -      -     66    141  8.32M 
17.6M
   mirror                                10.7G   267G    134    143  16.6M 
17.6M
     c4t600A0B800039C9B500000AA047B4529Bd0      -      -     66    141  8.32M 
17.6M
     c4t600A0B800039C9B500000AA447B4544Fd0      -      -     66    141  8.30M 
17.6M
   mirror                                10.7G   267G    134    143  16.6M 
17.6M
     c4t600A0B800039C9B500000AA847B45605d0      -      -     66    141  8.31M 
17.6M
     c4t600A0B800039C9B500000AAC47B45739d0      -      -     66    141  8.30M 
17.6M
   mirror                                10.7G   267G    134    143  16.6M 
17.6M
     c4t600A0B800039C9B500000AB047B457ADd0      -      -     66    141  8.30M 
17.6M
     c4t600A0B800039C9B500000AB447B4595Fd0      -      -     66    141  8.29M 
17.6M
--------------------------------------  -----  -----  -----  -----  -----  -----

Tim

2008-Feb-15 01:12 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On 2/14/08, Bob Friesenhahn <bfriesen at simple.dallas.tx.us>
wrote:>
> Under Solaris 10 on a 4 core Sun Ultra 40 with 20GB RAM, I am setting
> up a Sun StorageTek 2540 with 12 300GB 15K RPM SAS drives and
> connected via load-shared 4Gbit FC links.  This week I have tried many
> different configurations, using firmware managed RAID, ZFS managed
> RAID, and with the controller cache enabled or disabled.
>
> My objective is to obtain the best single-file write performance.
> Unfortunately, I am hitting some sort of write bottleneck and I am not
> sure how to solve it.  I was hoping for a write speed of 300MB/second.
> With ZFS on top of a firmware managed RAID 0 across all 12 drives, I
> hit a peak of 200MB/second.  With each drive exported as a LUN and a
> ZFS pool of 6 pairs, I see a write rate of 154MB/second.  The number
> of drives used has not had much effect on write rate.
>
> Information on my pool is shown at the end of this email.
>
> I am driving the writes using ''iozone'' since
''filebench'' does not seem
> to want to install/work on Solaris 10.
>
> I am suspecting that the problem is that I am running out of IOPS
> since the drive array indicates a an average IOPS of 214 for one drive
> even though the peak write speed is only 26MB/second (peak read is
> 42MB/second).
>
> Can someone share with me what they think the write bottleneck might
> be and how I can surmount it?
>
> Thanks,
>
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> % zpool status
>    pool: Sun_2540
>   state: ONLINE
>   scrub: none requested
> config:
>
>          NAME                                       STATE     READ WRITE
> CKSUM
>          Sun_2540                                   ONLINE       0
> 0     0
>            mirror                                   ONLINE       0
> 0     0
>              c4t600A0B80003A8A0B0000096A47B4559Ed0  ONLINE       0
> 0     0
>              c4t600A0B80003A8A0B0000096E47B456DAd0  ONLINE       0
> 0     0
>            mirror                                   ONLINE       0
> 0     0
>              c4t600A0B80003A8A0B0000096147B451BEd0  ONLINE       0
> 0     0
>              c4t600A0B80003A8A0B0000096647B453CEd0  ONLINE       0
> 0     0
>            mirror                                   ONLINE       0
> 0     0
>              c4t600A0B80003A8A0B0000097347B457D4d0  ONLINE       0
> 0     0
>              c4t600A0B800039C9B500000A9C47B4522Dd0  ONLINE       0
> 0     0
>            mirror                                   ONLINE       0
> 0     0
>              c4t600A0B800039C9B500000AA047B4529Bd0  ONLINE       0
> 0     0
>              c4t600A0B800039C9B500000AA447B4544Fd0  ONLINE       0
> 0     0
>            mirror                                   ONLINE       0
> 0     0
>              c4t600A0B800039C9B500000AA847B45605d0  ONLINE       0
> 0     0
>              c4t600A0B800039C9B500000AAC47B45739d0  ONLINE       0
> 0     0
>            mirror                                   ONLINE       0
> 0     0
>              c4t600A0B800039C9B500000AB047B457ADd0  ONLINE       0
> 0     0
>              c4t600A0B800039C9B500000AB447B4595Fd0  ONLINE       0
> 0     0
>
> errors: No known data errors
> freddy:~% zpool iostat
>                 capacity     operations    bandwidth
> pool         used  avail   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> Sun_2540    64.0G  1.57T    808    861  99.8M   105M
> freddy:~% zpool iostat -v
>                                             capacity
> operations    bandwidth
> pool                                     used  avail   read  write
> read  write
>
> --------------------------------------  -----  -----  -----  -----  ----- 
-----
> Sun_2540                                64.0G  1.57T    809    860
> 100M   105M
>    mirror                                10.7G   267G    135    143  16.7M
>   17.6M
>      c4t600A0B80003A8A0B0000096A47B4559Ed0      -      -     66    141
> 8.37M  17.6M
>      c4t600A0B80003A8A0B0000096E47B456DAd0      -      -     67    141
> 8.37M  17.6M
>    mirror                                10.7G   267G    135    143  16.7M
>   17.6M
>      c4t600A0B80003A8A0B0000096147B451BEd0      -      -     66    141
> 8.37M  17.6M
>      c4t600A0B80003A8A0B0000096647B453CEd0      -      -     66    141
> 8.37M  17.6M
>    mirror                                10.7G   267G    134    143  16.7M
>   17.6M
>      c4t600A0B80003A8A0B0000097347B457D4d0      -      -     66    141
> 8.34M  17.6M
>      c4t600A0B800039C9B500000A9C47B4522Dd0      -      -     66    141
> 8.32M  17.6M
>    mirror                                10.7G   267G    134    143  16.6M
>   17.6M
>      c4t600A0B800039C9B500000AA047B4529Bd0      -      -     66    141
> 8.32M  17.6M
>      c4t600A0B800039C9B500000AA447B4544Fd0      -      -     66    141
> 8.30M  17.6M
>    mirror                                10.7G   267G    134    143  16.6M
>   17.6M
>      c4t600A0B800039C9B500000AA847B45605d0      -      -     66    141
> 8.31M  17.6M
>      c4t600A0B800039C9B500000AAC47B45739d0      -      -     66    141
> 8.30M  17.6M
>    mirror                                10.7G   267G    134    143  16.6M
>   17.6M
>      c4t600A0B800039C9B500000AB047B457ADd0      -      -     66    141
> 8.30M  17.6M
>      c4t600A0B800039C9B500000AB447B4595Fd0      -      -     66    141
> 8.29M  17.6M
>
> --------------------------------------  -----  -----  -----  -----  ----- 
-----
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

If you''re going for best single file write performance, why are you
doing
mirrors of the LUNs?  Perhaps I''m misunderstanding why you went from
one
giant raid-0 to what is essentially a raid-10.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080214/dfb02737/attachment.html>

Bob Friesenhahn

2008-Feb-15 02:34 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Thu, 14 Feb 2008, Tim wrote:>
> If you''re going for best single file write performance, why are
you doing
> mirrors of the LUNs?  Perhaps I''m misunderstanding why you went
from one
> giant raid-0 to what is essentially a raid-10.
That decision was made because I also need data reliability.

As mentioned before, the write rate peaked at 200MB/second using 
RAID-0 across 12 disks exported as one big LUN.  Other firmware-based 
methods I tried typically offered about 170MB/second.  Even a four 
disk firmware-managed RAID-5 with ZFS on top offered about 
165MB/second.  Given that I would like to achieve 300MB/second, a few 
tens of MB don''t make much difference.  It may be that I bought the 
wrong product, but perhaps there is a configuration change which will 
help make up some of the difference without sacrificing data 
reliability.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Will Murnane

2008-Feb-15 03:50 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, Feb 15, 2008 at 2:34 AM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:>  As mentioned before, the write rate peaked at 200MB/second using
>  RAID-0 across 12 disks exported as one big LUN.  Other firmware-based
>  methods I tried typically offered about 170MB/second.  Even a four
>  disk firmware-managed RAID-5 with ZFS on top offered about
>  165MB/second.  Given that I would like to achieve 300MB/second, a few
>  tens of MB don''t make much difference.What is the workload for this system?  Benchmarks are fine and good,
but application performance is the determining factor of whether a
system is performing acceptably.

Perhaps iozone is behaving in a bad way; you might investigate
bonnie++: http://www.sunfreeware.com/programlistintel10.html

Will

Bob Friesenhahn

2008-Feb-15 04:29 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Will Murnane wrote:> What is the workload for this system?  Benchmarks are fine and good,
> but application performance is the determining factor of whether a
> system is performing acceptably.
The system is primarily used for image processing where the image data 
is uncompressed and a typical file is 12MB.  In some cases the files 
will be hundreds of MB or GB.  The typical case is to read a file and 
output a new file.  For some very large files, an uncompressed 
temporary file is edited in place with random access.

I am the author of the application and need the filesystem to be fast 
enough that it will uncover any slowness in my code. :-)
> Perhaps iozone is behaving in a bad way; you might investigate
That is always possible.  Iozone (http://www.iozone.org/) has been 
around for a very long time and has seen a lot of improvement by many 
smart people so it does not seem very suspect.
> bonnie++: http://www.sunfreeware.com/programlistintel10.html
I will check it out.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Roch Bourbonnais

2008-Feb-15 10:29 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Le 15 f?vr. 08 ? 03:34, Bob Friesenhahn a ?crit :
> On Thu, 14 Feb 2008, Tim wrote:
>>
>> If you''re going for best single file write performance, why
are you
>> doing
>> mirrors of the LUNs?  Perhaps I''m misunderstanding why you
went
>> from one
>> giant raid-0 to what is essentially a raid-10.
>
> That decision was made because I also need data reliability.
>
> As mentioned before, the write rate peaked at 200MB/second using
> RAID-0 across 12 disks exported as one big LUN.
What was the interlace on the LUN ?

> Other firmware-based
> methods I tried typically offered about 170MB/second.  Even a four
> disk firmware-managed RAID-5 with ZFS on top offered about
> 165MB/second.  Given that I would like to achieve 300MB/second, a few
> tens of MB don''t make much difference.  It may be that I bought
the
> wrong product, but perhaps there is a configuration change which will
> help make up some of the difference without sacrificing data
> reliability.
>
If this is 165MB application rate consider that ZFS sends that much to  
each side of the mirror.
Your data channel rate was 330MB/sec.


-r

> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bob Friesenhahn

2008-Feb-15 17:24 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Roch Bourbonnais wrote:>> 
>> As mentioned before, the write rate peaked at 200MB/second using
>> RAID-0 across 12 disks exported as one big LUN.
>
> What was the interlace on the LUN ?
There are two 4Gbit FC interfaces on an Emulex LPe11002 card which are 
supposedly acting in a load-share configuration.
> If this is 165MB application rate consider that ZFS sends that much to each
> side of the mirror.
> Your data channel rate was 330MB/sec.
Yes, I am aware of the ZFS RAID "write penalty" but in fact it has 
only cost 20MB per second vs doing the RAID using controller firmware 
(150MB vs 170MB/second).  This indicates that there is plenty of 
communications bandwidth from the host to the array.  The measured 
read rates are in the 470MB to 510MB/second range.

While writing, it is clear that ZFS does not use all of the drives for 
writes at once since the drive LEDs show that some remain 
temporarily idle and ZFS cycles through them.

I would be very happy to hear from other StorageTek 2540 owners as to 
the write rate they were able to achieve.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Roch Bourbonnais

2008-Feb-15 18:30 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Le 15 f?vr. 08 ? 18:24, Bob Friesenhahn a ?crit :
> On Fri, 15 Feb 2008, Roch Bourbonnais wrote:
>>>
>>> As mentioned before, the write rate peaked at 200MB/second using
>>> RAID-0 across 12 disks exported as one big LUN.
>>
>> What was the interlace on the LUN ?
>
The question was about LUN  interlace not interface.
128K to 1M works better.
> There are two 4Gbit FC interfaces on an Emulex LPe11002 card which are
> supposedly acting in a load-share configuration.
>
>> If this is 165MB application rate consider that ZFS sends that much  
>> to each
>> side of the mirror.
>> Your data channel rate was 330MB/sec.
>
> Yes, I am aware of the ZFS RAID "write penalty" but in fact it
has
>
> only cost 20MB per second vs doing the RAID using controller firmware
> (150MB vs 170MB/second).  This indicates that there is plenty of
> communications bandwidth from the host to the array.  The measured
> read rates are in the 470MB to 510MB/second range.
>
Any compression ?
Does turn off checksum helps the number (that would point to a CPU  
limited throughput).

-r

> While writing, it is clear that ZFS does not use all of the drives for
> writes at once since the drive LEDs show that some remain
> temporarily idle and ZFS cycles through them.
>
> I would be very happy to hear from other StorageTek 2540 owners as to
> the write rate they were able to achieve.
>

> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bob Friesenhahn

2008-Feb-15 18:39 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Roch Bourbonnais wrote:>>> What was the interlace on the LUN ?
>
> The question was about LUN  interlace not interface.
> 128K to 1M works better.
The "segment size" is set to 128K.  The max the 2540 allows is 512K. 
Unfortunately, the StorageTek 2540 and CAM documentation does not 
really define what "segment size" means.
> Any compression ?
Compression is disabled.
> Does turn off checksum helps the number (that would point to a CPU limited 
> throughput).
I have not tried that but this system is loafing during the benchmark. 
It has four 3GHz Opteron cores.

Does this output from ''iostat -xnz 20'' help to understand
issues?

                     extended device statistics
     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
     3.0    0.7   26.4    3.5  0.0  0.0    0.0    4.2   0   2 c1t1d0
     0.0  154.2    0.0 19680.3  0.0 20.7    0.0  134.2   0  59
c4t600A0B80003A8A0B0000096147B451BEd0
     0.0  211.5    0.0 26940.5  1.1 33.9    5.0  160.5  99 100
c4t600A0B800039C9B500000A9C47B4522Dd0
     0.0  211.5    0.0 26940.6  1.1 33.9    5.0  160.4  99 100
c4t600A0B800039C9B500000AA047B4529Bd0
     0.0  154.0    0.0 19654.7  0.0 20.7    0.0  134.2   0  59
c4t600A0B80003A8A0B0000096647B453CEd0
     0.0  211.3    0.0 26915.0  1.1 33.9    5.0  160.5  99 100
c4t600A0B800039C9B500000AA447B4544Fd0
     0.0  152.4    0.0 19447.0  0.0 20.5    0.0  134.5   0  59
c4t600A0B80003A8A0B0000096A47B4559Ed0
     0.0  213.2    0.0 27183.8  0.9 34.1    4.2  159.9  90 100
c4t600A0B800039C9B500000AA847B45605d0
     0.0  152.5    0.0 19453.4  0.0 20.5    0.0  134.5   0  59
c4t600A0B80003A8A0B0000096E47B456DAd0
     0.0  213.2    0.0 27177.4  0.9 34.1    4.2  159.9  90 100
c4t600A0B800039C9B500000AAC47B45739d0
     0.0  213.2    0.0 27195.3  0.9 34.1    4.2  159.9  90 100
c4t600A0B800039C9B500000AB047B457ADd0
     0.0  154.4    0.0 19711.8  0.0 20.7    0.0  134.0   0  59
c4t600A0B80003A8A0B0000097347B457D4d0
     0.0  211.3    0.0 26958.6  1.1 33.9    5.0  160.6  99 100
c4t600A0B800039C9B500000AB447B4595Fd0

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Luke Lonergan

2008-Feb-15 19:14 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Hi Bob,

I?m assuming you?re measuring sequential write speed ? posting the iozone
results would help guide the discussion.

For the configuration you describe, you should definitely be able to sustain
200 MB/s write speed for a single file, single thread due to your use of
4Gbps Fibre Channel interfaces and RAID1.  Someone else brought up that with
host based mirroring over that interface you will be sending the data twice
over the FC-AL link, so since you only have 400 MB/s on the FC-AL interface
(load balancing will only work for two writes), then you have to divide that
by two.

If you do the mirroring on the RAID hardware you?ll get double that speed on
writing, or 400MB/s and the bottleneck is still the single FC-AL interface.

By comparison, we get 750 MB/s sequential read using six 15K RPM 300GB disks
on an adaptec (Sun OEM) in-host SAS RAID adapter in RAID10 on four streams
and I think I saw 350 MB/s write speed on one stream.  Each disk is capable
of 130 MB/s of read and write speed.

- Luke


On 2/15/08 10:39 AM, "Bob Friesenhahn" <bfriesen at
simple.dallas.tx.us> wrote:
> On Fri, 15 Feb 2008, Roch Bourbonnais wrote:
>>>> >>> What was the interlace on the LUN ?
>> >
>> > The question was about LUN  interlace not interface.
>> > 128K to 1M works better.
> 
> The "segment size" is set to 128K.  The max the 2540 allows is
512K.
> Unfortunately, the StorageTek 2540 and CAM documentation does not
> really define what "segment size" means.
> 
>> > Any compression ?
> 
> Compression is disabled.
> 
>> > Does turn off checksum helps the number (that would point to a CPU
limited
>> > throughput).
> 
> I have not tried that but this system is loafing during the benchmark.
> It has four 3GHz Opteron cores.
> 
> Does this output from ''iostat -xnz 20'' help to understand
issues?
> 
>                      extended device statistics
>      r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>      3.0    0.7   26.4    3.5  0.0  0.0    0.0    4.2   0   2 c1t1d0
>      0.0  154.2    0.0 19680.3  0.0 20.7    0.0  134.2   0  59
> c4t600A0B80003A8A0B0000096147B451BEd0
>      0.0  211.5    0.0 26940.5  1.1 33.9    5.0  160.5  99 100
> c4t600A0B800039C9B500000A9C47B4522Dd0
>      0.0  211.5    0.0 26940.6  1.1 33.9    5.0  160.4  99 100
> c4t600A0B800039C9B500000AA047B4529Bd0
>      0.0  154.0    0.0 19654.7  0.0 20.7    0.0  134.2   0  59
> c4t600A0B80003A8A0B0000096647B453CEd0
>      0.0  211.3    0.0 26915.0  1.1 33.9    5.0  160.5  99 100
> c4t600A0B800039C9B500000AA447B4544Fd0
>      0.0  152.4    0.0 19447.0  0.0 20.5    0.0  134.5   0  59
> c4t600A0B80003A8A0B0000096A47B4559Ed0
>      0.0  213.2    0.0 27183.8  0.9 34.1    4.2  159.9  90 100
> c4t600A0B800039C9B500000AA847B45605d0
>      0.0  152.5    0.0 19453.4  0.0 20.5    0.0  134.5   0  59
> c4t600A0B80003A8A0B0000096E47B456DAd0
>      0.0  213.2    0.0 27177.4  0.9 34.1    4.2  159.9  90 100
> c4t600A0B800039C9B500000AAC47B45739d0
>      0.0  213.2    0.0 27195.3  0.9 34.1    4.2  159.9  90 100
> c4t600A0B800039C9B500000AB047B457ADd0
>      0.0  154.4    0.0 19711.8  0.0 20.7    0.0  134.0   0  59
> c4t600A0B80003A8A0B0000097347B457D4d0
>      0.0  211.3    0.0 26958.6  1.1 33.9    5.0  160.6  99 100
> c4t600A0B800039C9B500000AB447B4595Fd0
> 
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080215/cd08f1bb/attachment.html>

Bob Friesenhahn

2008-Feb-15 20:13 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Luke Lonergan wrote:
> I''m assuming you''re measuring sequential write speed ?
posting the iozone
> results would help guide the discussion.
Posted below.  I am also including the output from mpathadm in case 
there is something wrong with the load sharing.
> For the configuration you describe, you should definitely be able to
sustain
> 200 MB/s write speed for a single file, single thread due to your use of
> 4Gbps Fibre Channel interfaces and RAID1.  Someone else brought up that
with
I only managed to get 200 MB/s write when I did RAID 0 across all 
drives using the 2540''s RAID controller and with ZFS on top.
> host based mirroring over that interface you will be sending the data twice
> over the FC-AL link, so since you only have 400 MB/s on the FC-AL interface
> (load balancing will only work for two writes), then you have to divide
that
> by two.
While I agree that data is sent twice (actually up to 8X if striping 
across four mirrors), it seems to me that the load balancing should 
still work for one application write since ZFS is what does the 
multiple device I/Os.
> If you do the mirroring on the RAID hardware you?ll get double that speed
on
> writing, or 400MB/s and the bottleneck is still the single FC-AL interface.
I didn''t see that level of performance.  Perhaps there is something I 
should be investigating?

Bob

Output of ''mpathadm list lu'':

 	/scsi_vhci/disk at g600a0b800039c9b50000000000000000
 		Total Path Count: 1
 		Operational Path Count: 1
 	/scsi_vhci/disk at g600a0b80003a8a0b0000000000000000
 		Total Path Count: 1
 		Operational Path Count: 1
 	/dev/rdsk/c4t600A0B80003A8A0B0000096147B451BEd0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B800039C9B500000A9C47B4522Dd0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B800039C9B500000AA047B4529Bd0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B80003A8A0B0000096647B453CEd0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B800039C9B500000AA447B4544Fd0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B80003A8A0B0000096A47B4559Ed0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B800039C9B500000AA847B45605d0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B80003A8A0B0000096E47B456DAd0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B800039C9B500000AAC47B45739d0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B800039C9B500000AB047B457ADd0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B80003A8A0B0000097347B457D4d0s2
 		Total Path Count: 2
 		Operational Path Count: 2
 	/dev/rdsk/c4t600A0B800039C9B500000AB447B4595Fd0s2
 		Total Path Count: 2
 		Operational Path Count: 2

Output of ''mpathadm show lu
/dev/rdsk/c4t600A0B800039C9B500000AB047B457ADd0s2'':

Logical Unit:  /dev/rdsk/c4t600A0B800039C9B500000AB047B457ADd0s2
 	mpath-support:  libmpscsi_vhci.so
 	Vendor:  SUN
 	Product:  LCSM100_F
 	Revision:  0617
 	Name Type:  unknown type
 	Name:  600a0b800039c9b500000ab047b457ad
 	Asymmetric:  yes
 	Current Load Balance:  round-robin
 	Logical Unit Group ID:  NA
 	Auto Failback:  on
 	Auto Probing:  NA

 	Paths:
 		Initiator Port Name:  10000000c967c830
 		Target Port Name:  200400a0b83a8a0c
 		Override Path:  NA
 		Path State:  OK
 		Disabled:  no

 		Initiator Port Name:  10000000c967c82f
 		Target Port Name:  200500a0b83a8a0c
 		Override Path:  NA
 		Path State:  OK
 		Disabled:  no

 	Target Port Groups:
 		ID:  4
 		Explicit Failover:  yes
 		Access State:  standby
 		Target Ports:
 			Name:  200400a0b83a8a0c
 			Relative ID:  0

 		ID:  1
 		Explicit Failover:  yes
 		Access State:  active
 		Target Ports:
 			Name:  200500a0b83a8a0c
 			Relative ID:  0

Performance test run using iozone:

 	Iozone: Performance Test of File I/O
 	        Version $Revision: 3.283 $
 		Compiled for 64 bit mode.
 		Build: Solaris10gcc-64

 	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
 	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
 	             Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
 	             Randy Dunlap, Mark Montague, Dan Million,
 	             Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
 	             Erik Habbinga, Kris Strecker, Walter Wong.

 	Run began: Thu Feb 14 16:35:51 2008

 	Auto Mode
 	Using Minimum Record Size 64 KB
 	Using Maximum Record Size 512 KB
 	Using minimum file size of 33554432 kilobytes.
 	Using maximum file size of 67108864 kilobytes.
 	Command line used: iozone -a -i 0 -i 1 -y 64 -q 512 -n 32G -g 64G
 	Output is in Kbytes/sec
 	Time Resolution = 0.000001 seconds.
 	Processor cache size set to 1024 Kbytes.
 	Processor cache line size set to 32 bytes.
 	File stride size set to 17 * record size.
                                                             random  random   
bkwd  record  stride
               KB  reclen   write rewrite    read    reread    read   write   
read rewrite    read   fwrite frewrite   fread  freread
         33554432      64  150370  113779   454731   456158
         33554432     128  147032  181308   455496   456239
         33554432     256  148182  169944   454192   456252
         33554432     512  153843  194189   473982   516130
         67108864      64  151047  111227   463406   456302
         67108864     128  148597  159236   456959   488100
         67108864     256  148995  165041   463519   453896
         67108864     512  154556  166802   458304   456833

iozone test complete.

=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Peter Tribble

2008-Feb-15 20:24 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, Feb 15, 2008 at 12:30 AM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> Under Solaris 10 on a 4 core Sun Ultra 40 with 20GB RAM, I am setting
>  up a Sun StorageTek 2540 with 12 300GB 15K RPM SAS drives and
>  connected via load-shared 4Gbit FC links.  This week I have tried many
>  different configurations, using firmware managed RAID, ZFS managed
>  RAID, and with the controller cache enabled or disabled.
>
>  My objective is to obtain the best single-file write performance.
>  Unfortunately, I am hitting some sort of write bottleneck and I am not
>  sure how to solve it.  I was hoping for a write speed of 300MB/second.
>  With ZFS on top of a firmware managed RAID 0 across all 12 drives, I
>  hit a peak of 200MB/second.  With each drive exported as a LUN and a
>  ZFS pool of 6 pairs, I see a write rate of 154MB/second.  The number
>  of drives used has not had much effect on write rate.
May not be relevant, but still worth checking - I have a 2530 (which ought
to be that same only SAS instead of FC), and got fairly poor performance
at first. Things improved significantly when I got the LUNs properly
balanced across the controllers.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Bob Friesenhahn

2008-Feb-15 20:50 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Peter Tribble wrote:>
> May not be relevant, but still worth checking - I have a 2530 (which ought
> to be that same only SAS instead of FC), and got fairly poor performance
> at first. Things improved significantly when I got the LUNs properly
> balanced across the controllers.
What do you mean by "properly balanced across the controllers"?  Are 
you using the multipath support in Solaris 10 or are you relying on 
ZFS to balance the I/O load?  Do some disks have more affinity for a 
controller than the other?

With the 2540, there is a FC connection to each redundant controller. 
The Solaris 10 multipathing presumably load-shares the I/O to each 
controller.  The controllers then perform some sort of magic to get 
the data to and from the SAS drives.

The controller stats are below.  I notice that it seems that 
controller B has seen a bit more activity than controller A but the 
firmware does not provide a controller uptime value so it is possible 
that one controller was up longer than another:

Performance Statistics - A on Storage System Array-1
Timestamp: 		Fri Feb 15 14:37:39 CST 2008
Total IOPS: 		1098.83
Average IOPS: 		355.83
Read %: 		38.28
Write %: 		61.71
Total Data Transferred:	139284.41 KBps
Read: 			53844.26 KBps
Average Read: 		17224.04 KBps
Peak Read: 		242232.70 KBps
Written: 		85440.15 KBps
Average Written: 	26966.58 KBps
Peak Written: 		139918.90 KBps
Average Read Size: 	639.96 KB
Average Write Size: 	629.94 KB
Cache Hit %: 		85.32

Performance Statistics - B on Storage System Array-1
Timestamp: 		Fri Feb 15 14:37:45 CST 2008
Total IOPS: 		1526.69
Average IOPS: 		497.32
Read %: 		34.90
Write %: 		65.09
Total Data Transferred:	193594.58 KBps
Read:			68200.00 KBps
Average Read:		24052.61 KBps
Peak Read:		339693.55 KBps
Written:		125394.58 KBps
Average Written:	37768.40 KBps
Peak Written:		183534.66 KBps
Average Read Size:	895.80 KB
Average Write Size:	883.38 KB
Cache Hit %:		75.05

If I then go to the performance stats on an individual disk, I see

Performance Statistics - Disk-08 on Storage System Array-1
Timestamp:		Fri Feb 15 14:43:36 CST 2008
Total IOPS:		196.33
Average IOPS:		72.01
Read %:			9.65
Write %:		90.34
Total Data Transferred:	25076.91 KBps
Read:			2414.11 KBps
Average Read:		3521.44 KBps
Peak Read:		48422.00 KBps
Written:		22662.79 KBps
Average Written:	5423.78 KBps
Peak Written:		28036.43 KBps
Average Read Size:	127.29 KB
Average Write Size:	127.77 KB
Cache Hit %:		89.30

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Peter Tribble

2008-Feb-15 21:00 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, Feb 15, 2008 at 8:50 PM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Fri, 15 Feb 2008, Peter Tribble wrote:
>  >
>  > May not be relevant, but still worth checking - I have a 2530 (which
ought
>  > to be that same only SAS instead of FC), and got fairly poor
performance
>  > at first. Things improved significantly when I got the LUNs properly
>  > balanced across the controllers.
>
>  What do you mean by "properly balanced across the controllers"? 
Are
>  you using the multipath support in Solaris 10 or are you relying on
>  ZFS to balance the I/O load?  Do some disks have more affinity for a
>  controller than the other?
Each LUN is accessed through only one of the controllers (I presume the
2540 works the same way as the 2530 and 61X0 arrays). The paths are
active/passive (if the active fails it will relocate to the other path).
When I set mine up the first time it allocated all the LUNs to controller B
and performance was terrible. I then manually transferred half the LUNs
to controller A and it started to fly.

I''m using SAS multipathing for failover and just get ZFS to dynamically
stripe
across the LUNs.

Your figures show asymmetry, but that may just be a reflection of the
setup where you just created a single raid-0 LUN which would only use
one path.

(I don''t really understand any of this stuff. Too much fiddling around
for my liking.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Bob Friesenhahn

2008-Feb-15 21:28 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Peter Tribble wrote:
> Each LUN is accessed through only one of the controllers (I presume the
> 2540 works the same way as the 2530 and 61X0 arrays). The paths are
> active/passive (if the active fails it will relocate to the other path).
> When I set mine up the first time it allocated all the LUNs to controller B
> and performance was terrible. I then manually transferred half the LUNs
> to controller A and it started to fly.
I assume that you either altered the "Access State" shown for the LUN 
in the output of ''mpathadm show lu DEVICE'' or you noticed and 
observed the pattern:

         Target Port Groups:
                 ID:  3
                 Explicit Failover:  yes
                 Access State:  active
                 Target Ports:
                         Name:  200400a0b83a8a0c
                         Relative ID:  0

                 ID:  2
                 Explicit Failover:  yes
                 Access State:  standby
                 Target Ports:
                         Name:  200500a0b83a8a0c
                         Relative ID:  0

I find this all very interesting and illuminating:

for dev in c4t600A0B80003A8A0B0000096A47B4559Ed0  \
c4t600A0B80003A8A0B0000096E47B456DAd0 \
c4t600A0B80003A8A0B0000096147B451BEd0 \
c4t600A0B80003A8A0B0000096647B453CEd0 \
c4t600A0B80003A8A0B0000097347B457D4d0 \
c4t600A0B800039C9B500000A9C47B4522Dd0 \
c4t600A0B800039C9B500000AA047B4529Bd0 \
c4t600A0B800039C9B500000AA447B4544Fd0 \
c4t600A0B800039C9B500000AA847B45605d0 \
c4t600A0B800039C9B500000AAC47B45739d0 \
c4t600A0B800039C9B500000AB047B457ADd0 \
c4t600A0B800039C9B500000AB447B4595Fd0 \
do
echo "=== $dev ==="
for> mpathadm show lu /dev/rdsk/$dev | grep ''Access State''
for> done
=== c4t600A0B80003A8A0B0000096A47B4559Ed0 ==                 Access State: 
active
                 Access State:  standby
=== c4t600A0B80003A8A0B0000096E47B456DAd0 ==                 Access State: 
active
                 Access State:  standby
=== c4t600A0B80003A8A0B0000096147B451BEd0 ==                 Access State: 
active
                 Access State:  standby
=== c4t600A0B80003A8A0B0000096647B453CEd0 ==                 Access State: 
active
                 Access State:  standby
=== c4t600A0B80003A8A0B0000097347B457D4d0 ==                 Access State: 
active
                 Access State:  standby
=== c4t600A0B800039C9B500000A9C47B4522Dd0 ==                 Access State: 
active
                 Access State:  standby
=== c4t600A0B800039C9B500000AA047B4529Bd0 ==                 Access State: 
standby
                 Access State:  active
=== c4t600A0B800039C9B500000AA447B4544Fd0 ==                 Access State: 
standby
                 Access State:  active
=== c4t600A0B800039C9B500000AA847B45605d0 ==                 Access State: 
standby
                 Access State:  active
=== c4t600A0B800039C9B500000AAC47B45739d0 ==                 Access State: 
standby
                 Access State:  active
=== c4t600A0B800039C9B500000AB047B457ADd0 ==                 Access State: 
standby
                 Access State:  active
=== c4t600A0B800039C9B500000AB447B4595Fd0 ==                 Access State: 
standby
                 Access State:  active

Notice that the first six LUNs are active to one controller while the 
second six LUNs are active to the other controller.  Based on this, I 
should rebuild my pool by splitting my mirrors across this boundary.

I am really happy that ZFS makes such things easy to try out.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2008-Feb-15 22:01 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Bob Friesenhahn wrote:>
> Notice that the first six LUNs are active to one controller while the
> second six LUNs are active to the other controller.  Based on this, I
> should rebuild my pool by splitting my mirrors across this boundary.
>
> I am really happy that ZFS makes such things easy to try out.
Now that I have tried this out, I can unhappily say that it made no 
measurable difference to actual performance.  However it seems like a 
better layout anyway.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Albert Chin

2008-Feb-15 22:17 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, Feb 15, 2008 at 09:00:05PM +0000, Peter Tribble
wrote:> On Fri, Feb 15, 2008 at 8:50 PM, Bob Friesenhahn
> <bfriesen at simple.dallas.tx.us> wrote:
> > On Fri, 15 Feb 2008, Peter Tribble wrote:
> >  >
> >  > May not be relevant, but still worth checking - I have a 2530
(which ought
> >  > to be that same only SAS instead of FC), and got fairly poor
performance
> >  > at first. Things improved significantly when I got the LUNs
properly
> >  > balanced across the controllers.
> >
> >  What do you mean by "properly balanced across the
controllers"?  Are
> >  you using the multipath support in Solaris 10 or are you relying on
> >  ZFS to balance the I/O load?  Do some disks have more affinity for a
> >  controller than the other?
> 
> Each LUN is accessed through only one of the controllers (I presume the
> 2540 works the same way as the 2530 and 61X0 arrays). The paths are
> active/passive (if the active fails it will relocate to the other path).
> When I set mine up the first time it allocated all the LUNs to controller B
> and performance was terrible. I then manually transferred half the LUNs
> to controller A and it started to fly.
http://groups.google.com/group/comp.unix.solaris/browse_frm/thread/59b43034602a7b7f/0b500afc4d62d434?lnk=st&q=#0b500afc4d62d434

-- 
albert chin (china at thewrittenword.com)

Luke Lonergan

2008-Feb-15 22:20 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Hi Bob,

On 2/15/08 12:13 PM, "Bob Friesenhahn" <bfriesen at
simple.dallas.tx.us> wrote:
> I only managed to get 200 MB/s write when I did RAID 0 across all
> drives using the 2540''s RAID controller and with ZFS on top.
Ridiculously bad.

You should max out both FC-AL links and get 800 MB/s.
 > While I agree that data is sent twice (actually up to 8X if striping
> across four mirrors)
Still only twice the data that would otherwise be sent, in other words: the
mirroring causes a duplicate set of data to be written.
> it seems to me that the load balancing should
> still work for one application write since ZFS is what does the
> multiple device I/Os.
Depends on how the LUNs are used within the pool, but yes that''s what
you
should expect in which case you should get 400 MB/s writes on one file using
RAID10.
>> If you do the mirroring on the RAID hardware you?ll get double that
speed on
>> writing, or 400MB/s and the bottleneck is still the single FC-AL
interface.
> 
> I didn''t see that level of performance.  Perhaps there is
something I
> should be investigating?
Yes, if it weren''t for the slow FC-AL in your data path you should be
able
to sustain 20 x 130 MB/s = 2,600 MB/s based on the drive speeds.

Given that you''re not even saturating the FC-AL links, the problem is
in the
hardware RAID.  I suggest disabling read and write caching in the hardware
RAID.

- Luke

Bob Friesenhahn

2008-Feb-15 22:33 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Luke Lonergan wrote:>> I only managed to get 200 MB/s write when I did RAID 0 across all
>> drives using the 2540''s RAID controller and with ZFS on top.
>
> Ridiculously bad.
I agree. :-(
>> While I agree that data is sent twice (actually up to 8X if striping
>> across four mirrors)
>
> Still only twice the data that would otherwise be sent, in other words: the
> mirroring causes a duplicate set of data to be written.
Right.  But more little bits of data to be sent due to ZFS striping.
> Given that you''re not even saturating the FC-AL links, the problem
is in the
> hardware RAID.  I suggest disabling read and write caching in the hardware
> RAID.
Hardware RAID is not an issue in this case since each disk is exported 
as a LUN.  Performance with ZFS is not much different than when 
hardware RAID was used.  I previously tried disabling caching in the 
hardware and it did not make a difference in the results.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Richard Elling

2008-Feb-15 23:00 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Bob Friesenhahn wrote:> On Fri, 15 Feb 2008, Luke Lonergan wrote:
>   
>>> I only managed to get 200 MB/s write when I did RAID 0 across all
>>> drives using the 2540''s RAID controller and with ZFS on
top.
>>>       
>> Ridiculously bad.
>>     
>
> I agree. :-(
>
>   
>>> While I agree that data is sent twice (actually up to 8X if
striping
>>> across four mirrors)
>>>       
>> Still only twice the data that would otherwise be sent, in other words:
the
>> mirroring causes a duplicate set of data to be written.
>>     
>
> Right.  But more little bits of data to be sent due to ZFS striping.
>   
These "little bits" should be 128kBytes by default, which should
be plenty to saturate the paths.  There seems to be something else
going on here...

from the iostat data:

                     extended device statistics
     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
...
     0.0  211.5    0.0 26940.5  1.1 33.9    5.0  160.5  99 100
c4t600A0B800039C9B500000A9C47B4522Dd0
     0.0  211.5    0.0 26940.6  1.1 33.9    5.0  160.4  99 100
c4t600A0B800039C9B500000AA047B4529Bd0
     0.0  154.0    0.0 19654.7  0.0 20.7    0.0  134.2   0  59
c4t600A0B80003A8A0B0000096647B453CEd0
...


shows that we have an average of 33.9 iops of 128kBytes each queued
to the storage device at a given time.  There is an iop queued to the
storage device at all times (100% busy).  The 59% busy device
might not always be 59% busy, but it is difficult to see from this output
because you used the "z" flag.  Looks to me like ZFS is keeping the
queues full, and the device is slow to service them (asvc_t).  This
is surprising, to a degree, because we would expect faster throughput
to a nonvolatile write cache.

It would be interesting to see the response for a stable idle system,
start the workload, see the fast response as we hit the write cache,
followed by the slowdown as we fill the write cache.  This sort of
experiment is usually easy to create.
 -- richard

Bob Friesenhahn

2008-Feb-15 23:48 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Fri, 15 Feb 2008, Albert Chin wrote:>
>
http://groups.google.com/group/comp.unix.solaris/browse_frm/thread/59b43034602a7b7f/0b500afc4d62d434?lnk=st&q=#0b500afc4d62d434
This is really discouraging.  Based on these newsgroup postings I am 
thinking that the Sun StorageTek 2540 was not a good investment for 
me, especially given that the $23K for it came right out of my own 
paycheck and it took me 6 months of frustration (first shipment was 
damaged) to receive it.  Regardless, this was the best I was able to 
afford unless I built the drive array myself.

The page at 
http://www.sun.com/storagetek/disk_systems/workgroup/2540/benchmarks.jsp 
claims "546.22 MBPS" for the large file processing benchmark.  So I go
to look at the actual SPC2 full disclosure report and see that for one 
stream, the average data rate is 105MB/second (compared with 
102MB/second with RAID-5), and rises to 284MB/second with 10 streams. 
The product obviously performs much better for reads than it does for 
writes and is better for multi-user performance than single-user.

It seems like I am getting a good bit more performance from my own 
setup than what the official benchmark suggests (they used 72MB 
drives, with 24-drives total) so it seems that everything is working 
fine.

This is a lesson for me, and I have certainly learned a fair amount 
about drive arrays, fiber channel, and ZFS, in the process.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Joel Miller

2008-Feb-16 00:19 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

The segment size is amount of contiguous space that each drive contributes to a
single stripe.

So if you have a 5 drive RAID-5 set @ 128k segment size, a single stripe =
(5-1)*128k = 512k

BTW, Did you tweak the cache sync handling on the array?

-Joel
 
 
This message posted from opensolaris.org

Peter Tribble

2008-Feb-16 10:01 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Feb 15, 2008 10:20 PM, Luke Lonergan <llonergan at greenplum.com>
wrote:> Hi Bob,
>
> On 2/15/08 12:13 PM, "Bob Friesenhahn" <bfriesen at
simple.dallas.tx.us> wrote:
>
> > I only managed to get 200 MB/s write when I did RAID 0 across all
> > drives using the 2540''s RAID controller and with ZFS on top.
>
> Ridiculously bad.
Agreed. My 2530 gives me about 450MB/s on writes and 800 on reads.
That''s zfs striped across 4 LUNs, each of which is hardware raid-5
(24 drives in total, so each raid-5 LUN is 5 data + 1 parity).

What matters to me is that this is higher than the network bandwidth
into the server, and more bandwidth than the users can make use of
at the moment.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Mertol Ozyoney

2008-Feb-16 15:35 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Hi Tim;

 

2540 controler can achieve maximum 250 MB/sec on writes on the first 12
drives. So you are pretty close to maximum throughput already. 

Raid 5 can be a little bit slower. 

 

Please try to distribute Lun''s between controllers and try to benchmark
by
disabling cache mirroring. (it''s different then disableing cache) 

 

Best regards

Mertol

 

 

 

 

 


 <http://www.sun.com/> http://www.sun.com/emrkt/sigs/6g_top.gif

Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +902123352222
Email mertol.ozyoney at Sun.COM <mailto:Ayca.Yalcin at Sun.COM> 

 

 

From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Tim
Sent: 15 ?ubat 2008 Cuma 03:13
To: Bob Friesenhahn
Cc: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] Performance with Sun StorageTek 2540

 

On 2/14/08, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:

Under Solaris 10 on a 4 core Sun Ultra 40 with 20GB RAM, I am setting
up a Sun StorageTek 2540 with 12 300GB 15K RPM SAS drives and
connected via load-shared 4Gbit FC links.  This week I have tried many
different configurations, using firmware managed RAID, ZFS managed
RAID, and with the controller cache enabled or disabled.

My objective is to obtain the best single-file write performance.
Unfortunately, I am hitting some sort of write bottleneck and I am not
sure how to solve it.  I was hoping for a write speed of 300MB/second.
With ZFS on top of a firmware managed RAID 0 across all 12 drives, I
hit a peak of 200MB/second.  With each drive exported as a LUN and a
ZFS pool of 6 pairs, I see a write rate of 154MB/second.  The number
of drives used has not had much effect on write rate.

Information on my pool is shown at the end of this email.

I am driving the writes using ''iozone'' since
''filebench'' does not seem
to want to install/work on Solaris 10.

I am suspecting that the problem is that I am running out of IOPS
since the drive array indicates a an average IOPS of 214 for one drive
even though the peak write speed is only 26MB/second (peak read is
42MB/second).

Can someone share with me what they think the write bottleneck might
be and how I can surmount it?

Thanks,

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

% zpool status
   pool: Sun_2540
  state: ONLINE
  scrub: none requested
config:

         NAME                                       STATE     READ WRITE
CKSUM
         Sun_2540                                   ONLINE       0     0
0
           mirror                                   ONLINE       0     0
0
             c4t600A0B80003A8A0B0000096A47B4559Ed0  ONLINE       0     0
0
             c4t600A0B80003A8A0B0000096E47B456DAd0  ONLINE       0     0
0
           mirror                                   ONLINE       0     0
0
             c4t600A0B80003A8A0B0000096147B451BEd0  ONLINE       0     0
0
             c4t600A0B80003A8A0B0000096647B453CEd0  ONLINE       0     0
0
           mirror                                   ONLINE       0     0
0
             c4t600A0B80003A8A0B0000097347B457D4d0  ONLINE       0     0
0
             c4t600A0B800039C9B500000A9C47B4522Dd0  ONLINE       0     0
0
           mirror                                   ONLINE       0     0
0
             c4t600A0B800039C9B500000AA047B4529Bd0  ONLINE       0     0
0
             c4t600A0B800039C9B500000AA447B4544Fd0  ONLINE       0     0
0
           mirror                                   ONLINE       0     0
0
             c4t600A0B800039C9B500000AA847B45605d0  ONLINE       0     0
0
             c4t600A0B800039C9B500000AAC47B45739d0  ONLINE       0     0
0
           mirror                                   ONLINE       0     0
0
             c4t600A0B800039C9B500000AB047B457ADd0  ONLINE       0     0
0
             c4t600A0B800039C9B500000AB447B4595Fd0  ONLINE       0     0
0

errors: No known data errors
freddy:~% zpool iostat
                capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
Sun_2540    64.0G  1.57T    808    861  99.8M   105M
freddy:~% zpool iostat -v
                                            capacity     operations
bandwidth
pool                                     used  avail   read  write   read
write
--------------------------------------  -----  -----  -----  -----  -----
-----
Sun_2540                                64.0G  1.57T    809    860   100M
105M
   mirror                                10.7G   267G    135    143  16.7M
17.6M
     c4t600A0B80003A8A0B0000096A47B4559Ed0      -      -     66    141
8.37M  17.6M
     c4t600A0B80003A8A0B0000096E47B456DAd0      -      -     67    141
8.37M  17.6M
   mirror                                10.7G   267G    135    143  16.7M
17.6M
     c4t600A0B80003A8A0B0000096147B451BEd0      -      -     66    141
8.37M  17.6M
     c4t600A0B80003A8A0B0000096647B453CEd0      -      -     66    141
8.37M  17.6M
   mirror                                10.7G   267G    134    143  16.7M
17.6M
     c4t600A0B80003A8A0B0000097347B457D4d0      -      -     66    141
8.34M  17.6M
     c4t600A0B800039C9B500000A9C47B4522Dd0      -      -     66    141
8.32M  17.6M
   mirror                                10.7G   267G    134    143  16.6M
17.6M
     c4t600A0B800039C9B500000AA047B4529Bd0      -      -     66    141
8.32M  17.6M
     c4t600A0B800039C9B500000AA447B4544Fd0      -      -     66    141
8.30M  17.6M
   mirror                                10.7G   267G    134    143  16.6M
17.6M
     c4t600A0B800039C9B500000AA847B45605d0      -      -     66    141
8.31M  17.6M
     c4t600A0B800039C9B500000AAC47B45739d0      -      -     66    141
8.30M  17.6M
   mirror                                10.7G   267G    134    143  16.6M
17.6M
     c4t600A0B800039C9B500000AB047B457ADd0      -      -     66    141
8.30M  17.6M
     c4t600A0B800039C9B500000AB447B4595Fd0      -      -     66    141
8.29M  17.6M
--------------------------------------  -----  -----  -----  -----  -----
-----


_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



If you''re going for best single file write performance, why are you
doing
mirrors of the LUNs?  Perhaps I''m misunderstanding why you went from
one
giant raid-0 to what is essentially a raid-10.

--Tim

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080216/2a5cfd6a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1257 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080216/2a5cfd6a/attachment.gif>

Joel Miller

2008-Feb-16 16:09 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Bob,

Here is how you can tell the array to ignore cache sync commands and the force
unit access bits...(Sorry if it wraps..)

On a Solaris CAM install, the ''service'' command is in
"/opt/SUNWsefms/bin"

To read the current settings:
service -d arrayname -c read -q nvsram region=0xf2 host=0x00

save this output so you can reverse the changes below easily if needed...


To set new values:

service -d arrayname -c set -q nvsram region=0xf2 offset=0x17 value=0x01
host=0x00
service -d arrayname -c set -q nvsram region=0xf2 offset=0x18 value=0x01
host=0x00
service -d arrayname -c set -q nvsram region=0xf2 offset=0x21 value=0x01
host=0x00

Host region 00 is Solaris (w/Traffic Manager)

You will need to reboot both controllers after making the change before it
becomes active.
-Joel
 
 
This message posted from opensolaris.org

Bob Friesenhahn

2008-Feb-16 16:25 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Sat, 16 Feb 2008, Peter Tribble wrote:> Agreed. My 2530 gives me about 450MB/s on writes and 800 on reads.
> That''s zfs striped across 4 LUNs, each of which is hardware raid-5
> (24 drives in total, so each raid-5 LUN is 5 data + 1 parity).
Is this single-file bandwidth or multiple-file/thread bandwidth? 
According to Sun''s own benchmark data, the 2530 was capable of 
20MB/second more than the 2540 on writes for a single large file, and 
the difference went away after that.  For multi-user activity the 
throughput clearly improves to be similar to what you describe.  Most 
people are likely interested in maximizing multi-user performance, and 
particularly for reads.

Visit 
http://www.storageperformance.org/results/benchmark_results_spc2/#sun_spc2 
to see the various benchmark results.  According to these results, for 
large-file writes the 2530/2540 compares well with other StorageTek 
products, including the more expensive 6140 and 6540 arrays.  It also 
compares well with similarly-sized storage products from other 
vendors.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2008-Feb-16 16:43 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Sat, 16 Feb 2008, Mertol Ozyoney wrote:>
> Please try to distribute Lun''s between controllers and try to
benchmark by
> disabling cache mirroring. (it''s different then disableing cache)
By the term "disabling cache mirroring" are you talking about
"Write
Cache With Replication Enabled" in the Common Array Manager?

Does this feature maintain a redundant cache (two data copies) between 
controllers?

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Mertol Ozyoney

2008-Feb-16 16:54 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Yes, it does replicate data between controllers. Usualy it slows that a lot
espacialy on wirte heavy environments. If you properly tune ZFS you may not
need this feature for consistency...



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +902123352222
Email mertol.ozyoney at Sun.COM



-----Original Message-----
From: Bob Friesenhahn [mailto:bfriesen at simple.dallas.tx.us] 
Sent: 16 ?ubat 2008 Cumartesi 18:43
To: Mertol Ozyoney
Cc: zfs-discuss at opensolaris.org
Subject: RE: [zfs-discuss] Performance with Sun StorageTek 2540

On Sat, 16 Feb 2008, Mertol Ozyoney wrote:>
> Please try to distribute Lun''s between controllers and try to
benchmark by
> disabling cache mirroring. (it''s different then disableing cache)
By the term "disabling cache mirroring" are you talking about
"Write
Cache With Replication Enabled" in the Common Array Manager?

Does this feature maintain a redundant cache (two data copies) between 
controllers?

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2008-Feb-16 17:56 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Sat, 16 Feb 2008, Joel Miller wrote:
> Here is how you can tell the array to ignore cache sync commands and 
> the force unit access bits...(Sorry if it wraps..)
Thanks to the kind advice of yourself and Mertol Ozyoney, there is a 
huge boost in write performance:

Was: 154MB/second
Now: 279MB/second

The average service time for each disk LUN has dropped considerably.

The numbers provided by ''zfs iostat'' are very close to what is
measured by ''iozone''.

This is like night and day and gets me very close to my original 
target write speed of 300MB/second.

Thank you very much!

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Mertol Ozyoney

2008-Feb-17 11:22 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Hi Bob;

When you have some spare time can you prepare a simple benchmark report in
PDF that I can share with my customers to demonstrate the performance of
2540 ?

Best regards
Mertol



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +902123352222
Email mertol.ozyoney at Sun.COM


-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Bob Friesenhahn
Sent: 16 ?ubat 2008 Cumartesi 19:57
To: Joel Miller
Cc: zfs-discuss at opensolaris.org
Subject: Re: [zfs-discuss] Performance with Sun StorageTek 2540

On Sat, 16 Feb 2008, Joel Miller wrote:
> Here is how you can tell the array to ignore cache sync commands and 
> the force unit access bits...(Sorry if it wraps..)
Thanks to the kind advice of yourself and Mertol Ozyoney, there is a 
huge boost in write performance:

Was: 154MB/second
Now: 279MB/second

The average service time for each disk LUN has dropped considerably.

The numbers provided by ''zfs iostat'' are very close to what is
measured by ''iozone''.

This is like night and day and gets me very close to my original 
target write speed of 300MB/second.

Thank you very much!

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Ralf Ramge

2008-Feb-18 10:32 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Mertol Ozyoney wrote:>
> 2540 controler can achieve maximum 250 MB/sec on writes on the first 
> 12 drives. So you are pretty close to maximum throughput already.
>
> Raid 5 can be a little bit slower.
>
I''m a bit irritated now. I have ZFS running for some Sybase ASE 12.5 
databases using X4600 servers (8x dual core, 64 GB RAM, Solaris 10 
11/06) and 4 GBit/s lowest cost Infortrend Fibrechannel JBODs with a 
total of 4x 16 FC drives imported in a single mirrored zpool. I 
benchmarked them with tiobench, using a filesize of 64 GB and 32 
parallel threads. With an untweaked ZFS the average throughput I got 
was: sequential & random read > 1GB/s, sequential write 296 MB/s, random 
write 353 MB/s, leading to a total of approx. 650,000 IOPS with a 
maximum latency of < 350 ms after the databases went into production and 
the bottleneck are basically the FC HBA''s. These are averages, the
peaks
flatline  with reaching the 4 GBit/s FibreChannel maximum capacity 
pretty soon afterwards.

I''m a bit disturbed because I think about switching to 2530/2540 
shelves, but a maximum 250 MB/sec would disqualify them instantly, even 
with individual RAID controllers for each shelf. So my question is: Can 
I do the same thing I did with the IFT shelves, can I buy only 2501 
JBOBDs and attach them directly to the server, thus *not* using the 2540 
raid controller and still having access to the single drives?

I''m quite nervous about this, because I''m not just talking
about a
single databases - I''d need a total number of 42 shelves and
I''m pretty
sure SUN doesn''t offer Try&Buy deals at such a scale.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963 
ralf.ramge at webde.de - http://web.de/

1&1 Internet AG
Brauerstra?e 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger,
Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Norbert Lang,
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren

Bob Friesenhahn

2008-Feb-18 16:14 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Mon, 18 Feb 2008, Ralf Ramge wrote:> I''m a bit disturbed because I think about switching to 2530/2540
> shelves, but a maximum 250 MB/sec would disqualify them instantly, even
Note that this is single-file/single-thread I/O performance. I suggest 
that you read the formal benchmark report for this equipment since it 
covers multi-thread I/O performance as well.  The multi-user 
performance is considerably higher.

Given ZFS''s smarts, the JBOD approach seems like a good one as long as 
the hardware provides a non-volatile cache.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Roch - PAE

2008-Feb-18 16:27 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Bob Friesenhahn writes:

 > On Fri, 15 Feb 2008, Roch Bourbonnais wrote:
 > >>> What was the interlace on the LUN ?
 > >
 > > The question was about LUN  interlace not interface.
 > > 128K to 1M works better.
 > 
 > The "segment size" is set to 128K.  The max the 2540 allows is
512K.
 > Unfortunately, the StorageTek 2540 and CAM documentation does not 
 > really define what "segment size" means.
 > 
 > > Any compression ?
 > 
 > Compression is disabled.
 > 
 > > Does turn off checksum helps the number (that would point to a CPU
limited
 > > throughput).
 > 
 > I have not tried that but this system is loafing during the benchmark. 
 > It has four 3GHz Opteron cores.
 > 
 > Does this output from ''iostat -xnz 20'' help to
understand issues?
 > 
 >                      extended device statistics
 >      r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 >      3.0    0.7   26.4    3.5  0.0  0.0    0.0    4.2   0   2 c1t1d0
 >      0.0  154.2    0.0 19680.3  0.0 20.7    0.0  134.2   0  59
c4t600A0B80003A8A0B0000096147B451BEd0
 >      0.0  211.5    0.0 26940.5  1.1 33.9    5.0  160.5  99 100
c4t600A0B800039C9B500000A9C47B4522Dd0
 >      0.0  211.5    0.0 26940.6  1.1 33.9    5.0  160.4  99 100
c4t600A0B800039C9B500000AA047B4529Bd0
 >      0.0  154.0    0.0 19654.7  0.0 20.7    0.0  134.2   0  59
c4t600A0B80003A8A0B0000096647B453CEd0
 >      0.0  211.3    0.0 26915.0  1.1 33.9    5.0  160.5  99 100
c4t600A0B800039C9B500000AA447B4544Fd0
 >      0.0  152.4    0.0 19447.0  0.0 20.5    0.0  134.5   0  59
c4t600A0B80003A8A0B0000096A47B4559Ed0
 >      0.0  213.2    0.0 27183.8  0.9 34.1    4.2  159.9  90 100
c4t600A0B800039C9B500000AA847B45605d0
 >      0.0  152.5    0.0 19453.4  0.0 20.5    0.0  134.5   0  59
c4t600A0B80003A8A0B0000096E47B456DAd0
 >      0.0  213.2    0.0 27177.4  0.9 34.1    4.2  159.9  90 100
c4t600A0B800039C9B500000AAC47B45739d0
 >      0.0  213.2    0.0 27195.3  0.9 34.1    4.2  159.9  90 100
c4t600A0B800039C9B500000AB047B457ADd0
 >      0.0  154.4    0.0 19711.8  0.0 20.7    0.0  134.0   0  59
c4t600A0B80003A8A0B0000097347B457D4d0
 >      0.0  211.3    0.0 26958.6  1.1 33.9    5.0  160.6  99 100
c4t600A0B800039C9B500000AB447B4595Fd0
 > 

Interesting that a subset of 5 disks are responding faster
(which also leads to smaller actv queues and so lower
service times) than the 7 others.

....

and the slow ones are subject to more writes...haha.

If the sizes of the luns are different (or have different
amount of free blocks) then maybe ZFS is now trying to rebalance
free space by targetting a subset of the disks with more 
new data.  Pool throughput will be impacted by this.


-r





 > Bob
 > ===================================== > Bob Friesenhahn
 > bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
 > GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 > 
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Robert Milkowski

2008-Feb-18 23:51 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

Hello Joel,

Saturday, February 16, 2008, 4:09:11 PM, you wrote:

JM> Bob,

JM> Here is how you can tell the array to ignore cache sync commands
JM> and the force unit access bits...(Sorry if it wraps..)

JM> On a Solaris CAM install, the ''service'' command is in
"/opt/SUNWsefms/bin"

JM> To read the current settings:
JM> service -d arrayname -c read -q nvsram region=0xf2 host=0x00

JM> save this output so you can reverse the changes below easily if needed...


JM> To set new values:

JM> service -d arrayname -c set -q nvsram region=0xf2 offset=0x17 value=0x01
host=0x00
JM> service -d arrayname -c set -q nvsram region=0xf2 offset=0x18 value=0x01
host=0x00
JM> service -d arrayname -c set -q nvsram region=0xf2 offset=0x21 value=0x01
host=0x00

JM> Host region 00 is Solaris (w/Traffic Manager)

JM> You will need to reboot both controllers after making the change before
it becomes active.


Is it also necessary and does it work on 2530?


-- 
Best regards,
 Robert                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Joel Miller

2008-Feb-19 05:05 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

It is the same for the 2530, and I am fairly certain it is also valid  
for the 6130,6140, & 6540.

-Joel

On Feb 18, 2008, at 3:51 PM, Robert Milkowski <milek at task.gda.pl>
wrote:
> Hello Joel,
>
> Saturday, February 16, 2008, 4:09:11 PM, you wrote:
>
> JM> Bob,
>
> JM> Here is how you can tell the array to ignore cache sync commands
> JM> and the force unit access bits...(Sorry if it wraps..)
>
> JM> On a Solaris CAM install, the ''service'' command is
in "/opt/
> SUNWsefms/bin"
>
> JM> To read the current settings:
> JM> service -d arrayname -c read -q nvsram region=0xf2 host=0x00
>
> JM> save this output so you can reverse the changes below easily if  
> needed...
>
>
> JM> To set new values:
>
> JM> service -d arrayname -c set -q nvsram region=0xf2 offset=0x17  
> value=0x01 host=0x00
> JM> service -d arrayname -c set -q nvsram region=0xf2 offset=0x18  
> value=0x01 host=0x00
> JM> service -d arrayname -c set -q nvsram region=0xf2 offset=0x21  
> value=0x01 host=0x00
>
> JM> Host region 00 is Solaris (w/Traffic Manager)
>
> JM> You will need to reboot both controllers after making the change  
> before it becomes active.
>
>
> Is it also necessary and does it work on 2530?
>
>
> -- 
> Best regards,
> Robert                            mailto:milek at task.gda.pl
>                                       http://milek.blogspot.com
>

Bob Friesenhahn

2008-Feb-27 04:17 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Sun, 17 Feb 2008, Mertol Ozyoney wrote:
> Hi Bob;
>
> When you have some spare time can you prepare a simple benchmark report in
> PDF that I can share with my customers to demonstrate the performance of
> 2540 ?
While I do not claim that it is "simple" I have created a report on my
configuration and experience.  It should be useful for users of the 
Sun StorageTek 2540, ZFS, and Solaris 10 multipathing.

See

http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf

or http://tinyurl.com/2djewn for the URL challenged.

Feel free this share this document with anyone who is interested.

Thanks

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Cyril Plisko

2008-Feb-27 07:38 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Wed, Feb 27, 2008 at 6:17 AM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Sun, 17 Feb 2008, Mertol Ozyoney wrote:
>
>  > Hi Bob;
>  >
>  > When you have some spare time can you prepare a simple benchmark
report in
>  > PDF that I can share with my customers to demonstrate the performance
of
>  > 2540 ?
>
>  While I do not claim that it is "simple" I have created a report
on my
>  configuration and experience.  It should be useful for users of the
>  Sun StorageTek 2540, ZFS, and Solaris 10 multipathing.
>
>  See
>
> 
http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf
Nov 26, 2008 ??? May I borrow your time machine ? ;-)

-- 
Regards,
        Cyril

Bob Friesenhahn

2008-Feb-27 15:39 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

On Wed, 27 Feb 2008, Cyril Plisko wrote:>>
>> 
http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf
>
> Nov 26, 2008 ??? May I borrow your time machine ? ;-)
Are there any stock prices you would like to know about?  Perhaps you 
are interested in the outcome of the elections?

There was a time inversion layer in Texas. Fixed now ...

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Paul Van Der Zwan

2008-Feb-28 10:59 UTC

head link

[zfs-discuss] Performance with Sun StorageTek 2540

> On Wed, 27 Feb 2008, Cyril Plisko wrote:
>  >>
>  >> 
http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf
>  >
>  > Nov 26, 2008 ??? May I borrow your time machine ? ;-)
>  
>  Are there any stock prices you would like to know about?  Perhaps you 
> 
>  are interested in the outcome of the elections?
>  
No need for a time machine, the US presidential election outcome is already
known:

http://www.theonion.com/content/video/diebold_accidentally_leaks

 Paul

Reasonably Related Threads

Search for more reasonably related threads

zfs discuss - Feb 2008 - Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

[zfs-discuss] Performance with Sun StorageTek 2540

Reasonably Related Threads