thr3ads.net - zfs discuss - [zfs-discuss] ZFS bechmarks w/8 disk raid

If this information is useful, please help other people find it:
Share via:

Jonathan Wheeler

2006-Jul-17 12:34 UTC

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

Hi All,

I''ve just built an 8 disk zfs storage box, and I''m in the
testing phase before I put it into production. I''ve run into some
unusual results, and I was hoping the community could offer some suggestions.
I''ve bascially made the switch to Solaris on the promises of ZFS alone
(yes I''m that excited about it!), so naturally I''m looking
forward to some great performance - but it appears I''m going to need
some help finding all of it.

I was having even lower numbers with filebench, so I decided to dial back to a
really simple app for testing - bonnie.

The system is an nevada_41 EM64T 3ghz xeon. 1GB ram, with 8x seagate sata II
300GB disks, Supermicro SAT2-MV8 8 port sata controller, running at/on a 133Mhz
64pci-x bus.
The bottle neck here, by my thinkng, should be the disks themselves.
It''s not the disk interfaces (''300MB''), the disk bus
(300MB EACH), the pci-x bus (1.1GB), and I''d hope a 64-bit 3Ghz cpu
would be sufficent.

Tests were run on a fresh clean zpool, on an idle system. Rogue results were
dropped, and as you can see below, all tests were run more then once. 8GB should
be far more then the 1GB of RAM that the system has, eliminating caching issues.

If I''ve still managed to overlook something in my testing setup, please
let me know - I sure did try!

Sorry about the formatting - this is bound to end up ugly

Bonnie
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
raid0    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
8 disk   8196 78636 93.0 261804 64.2 125585 25.6 72160 95.3 246172 19.1 286.0 
2.0
8 disk   8196 79452 93.9 286292 70.2 129163 26.0 72422 95.5 243628 18.9 302.9 
2.1

so ~270MB/sec writes - awesome! 240MB/sec reads though - why would this be LOWER
then writes??

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
mirror   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
8 disk   8196 33285 38.6 46033  9.9 33077  6.8 67934 90.4  93445  7.7 230.5  1.3
8 disk   8196 34821 41.4 46136  9.0 32445  6.6 67120 89.1  94403  6.9 210.4  1.8

46MB/sec writes, each disk individually can do better, but I guess keeping 8
disks in sync is hurting performance. The 94MB/sec writes is interesting. One
the one hand, that''s greater then 1 disk''s worth, so
I''m getting striping performance out of a mirror GO ZFS. On the other,
if I can get striping performance from mirrored reads, why is it only 94MB/sec?
Seemingly it''s not cpu bound.


Now for the important test, raid-z

              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
raidz      MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
8 disk   8196 61785 70.9 142797 29.3 89342 19.9 64197 85.7 320554 32.6 131.3 
1.0
8 disk   8196 62869 72.4 131801 26.7 90692 20.7 63986 85.7 306152 33.4 127.3 
1.0
8 disk   8196 63103 72.9 128164 25.9 86175 19.4 64126 85.7 320410 32.7 124.5 
0.9
7 disk   8196 51103 58.8  93815 19.1 74093 16.1 64705 86.5 331865 32.8 124.9 
1.0
7 disk   8196 49446 56.8  93946 18.7 73092 15.8 64708 86.7 331458 32.7 127.1 
1.0
7 disk   8196 49831 57.1  81305 16.2 78101 16.9 64698 86.4 331577 32.7 132.4 
1.0
6 disk   8196 62360 72.3 157280 33.4 99511 21.9 65360 87.3 288159 27.1 132.7 
0.9
6 disk   8196 63291 72.8 152598 29.1 97085 21.4 65546 87.2 292923 26.7 133.4 
0.8
4 disk   8196 57965 67.9 123268 27.6 78712 17.1 66635 89.3 189482 15.9 134.1 
0.9

I''m getting distinctly non-linear scaling here.
Writes: 4 disks gives me 123MB/sec. Raid0 was giving me 270/8 =33Mb/sec with cpu
to spare (roughly half on what each individual disk should be capable of). Here
I''m getting 123/4= 30Mb/sec, or should that be 123/3= 41Mb/sec?
Using 30 as a basline, I''d be expecting to see twice that with 8 disks
(240ish?). What I end up with is ~135, Clearly not good scaling at all.
The really interesting numbers happen at 7 disks - it''s slower then
with 4, in all tests.
I ran it 3x to be sure.
Note this was a native 7 disk raid-z, it wasn''t 8 running in degraded
mode with 7.
Something is really wrong with my write performance here across the board.

Reads: 4 disks gives me 190MB/sec. WOAH! I''m very happy with that. 8
disks should scale to 380 then, Well 320 isn''t all that far off - no
biggie.
Looking at the 6 disk raidz is interesting though, 290MB/sec. The disks are good
for 60+MB/sec individually. 290 is 48/disk - note also that this is better then
my raid0 performance?!
Adding another 2 disks to my raidz gives me a mere 30Mb/sec extra performance?
Something is going very wrong here too.

The 7 disk raidz read test is about what I''d expect (330/7= 47/disk),
but it shows that the 8 disk is actually going backwards.

hmm...


I understand that going for an 8 disk wide raidz isn''t optimal in terms
of redundancy and IOPS/sec - but my workload shouldn''t involve large
amounts of sustained random IO, so I''m happy to take the loss in favour
of absolute capacity.
My issue here is the scaling on sequential block transfers, not optimal design.

All three raid levels have had unexpected results, and I''ll really
apprectiate some suggestions on how I can troubleshoot this. I know how to run
iostat while bonnie is running, but that''s about it. Incidentally,
iostat is telling me that the disks are at best on hitting around 70% B. With
the 8 disk tests, it was often below 50%....

Is my issue perhaps with the sata card that I''m using? Maybe
it''s just not able to handle that much throughput, despite being
advertised to do so. With Raid0 (aka dynamic stripes), I know that each disk can
read at 60-70Mb/sec. Why am I not getting 65*8 (500MB/sec+) performance. Maybe
it''s the marvell driver at fault here?

My thinking is that I need to get raid0 performing as expected before looking at
raidz, but I''m afraid I really don''t know where to begin.

All thoughts & suggestions welcome. I''m not using the disks yet, so
I can blow the zpool away as needed.

Many thanks,
Jonathan Wheeler
 
 
This message posted from opensolaris.org

Dana H. Myers

2006-Jul-17 15:47 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

Jonathan Wheeler wrote:

I''m not a ZFS expert - I''m just an enthusiastic user inside
Sun.

Here are some brief observations:
> Bonnie
>               -------Sequential Output-------- ---Sequential Input--
--Random--
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
> raid0    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec
%CPU
> 8 disk   8196 78636 93.0 261804 64.2 125585 25.6 72160 95.3 246172 19.1
286.0  2.0
> 8 disk   8196 79452 93.9 286292 70.2 129163 26.0 72422 95.5 243628 18.9
302.9  2.1
> 
> so ~270MB/sec writes - awesome! 240MB/sec reads though - why would this be
LOWER then writes??
I believe this can happen because ZFS is optimized for writes, though I would
tend
expect a sequential write followed by a sequential read to be about the same if
there''s no other filesystem activity during the write.
>               -------Sequential Output-------- ---Sequential Input--
--Random--
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
> mirror   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec
%CPU
> 8 disk   8196 33285 38.6 46033  9.9 33077  6.8 67934 90.4  93445  7.7 230.5
1.3
> 8 disk   8196 34821 41.4 46136  9.0 32445  6.6 67120 89.1  94403  6.9 210.4
1.8
> 
> 46MB/sec writes, each disk individually can do better, but I guess keeping
8 disks in sync is hurting performance. The 94MB/sec writes is interesting. One
the one hand, that''s greater then 1 disk''s worth, so
I''m getting striping performance out of a mirror GO ZFS. On the other,
if I can get striping performance from mirrored reads, why is it only 94MB/sec?
Seemingly it''s not cpu bound.
I expect a mirror to perform about the same as a single disk for writes, and
about
the same as two disks for reads, which seems to be the case here.  Someone from
the ZFS team can correct me, but I tend to believe that reads from a mirror are
scheduled in pairs; it doesn''t help the read performance to have 6 more
copies of
the same data available.
> Now for the important test, raid-z
I''ll have to let the experts dissect this data; it looks a little goofy
to me,
too.

Dana

Roch

2006-Jul-17 16:00 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

Sorry to plug my own blog but have you had a look at these ?

	http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to (raidz)
	http://blogs.sun.com/roller/page/roch?entry=the_dynamics_of_zfs

Also, my thinking is that raid-z is probably more friendly
when the config contains (power-of-2 + 1) disks (or + 2 for
raid-z2).

-r

Jonathan Wheeler writes:
 > Hi All,
 > 
 > I''ve just built an 8 disk zfs storage box, and I''m in
the testing
 > phase before I put it into production. I''ve run into some unusual
 > results, and I was hoping the community could offer some
 > suggestions. I''ve bascially made the switch to Solaris on the
promises
 > of ZFS alone (yes I''m that excited about it!), so naturally
I''m
 > looking forward to some great performance - but it appears I''m
going
 > to need some help finding all of it. 
 > 
 > I was having even lower numbers with filebench, so I decided to dial
 > back to a really simple app for testing - bonnie. 
 > 
 > The system is an nevada_41 EM64T 3ghz xeon. 1GB ram, with 8x seagate
 > sata II 300GB disks, Supermicro SAT2-MV8 8 port sata controller,
 > running at/on a 133Mhz 64pci-x bus. 
 > The bottle neck here, by my thinkng, should be the disks themselves. 
 > It''s not the disk interfaces (''300MB''), the
disk bus (300MB EACH), the
 > pci-x bus (1.1GB), and I''d hope a 64-bit 3Ghz cpu would be
sufficent.
 > 
 > Tests were run on a fresh clean zpool, on an idle system. Rogue
 > results were dropped, and as you can see below, all tests were run
 > more then once. 8GB should be far more then the 1GB of RAM that the
 > system has, eliminating caching issues. 
 > 
 > If I''ve still managed to overlook something in my testing setup,
 > please let me know - I sure did try! 
 > 
 > Sorry about the formatting - this is bound to end up ugly
 > 
 > Bonnie
 >               -------Sequential Output-------- ---Sequential Input--
--Random--
 >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
 > raid0    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec
%CPU
 > 8 disk   8196 78636 93.0 261804 64.2 125585 25.6 72160 95.3 246172 19.1
286.0  2.0
 > 8 disk   8196 79452 93.9 286292 70.2 129163 26.0 72422 95.5 243628 18.9
302.9  2.1
 > 
 > so ~270MB/sec writes - awesome! 240MB/sec reads though - why would this be
LOWER then writes??
 > 
 >               -------Sequential Output-------- ---Sequential Input--
--Random--
 >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
 > mirror   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec
%CPU
 > 8 disk   8196 33285 38.6 46033  9.9 33077  6.8 67934 90.4  93445  7.7
230.5  1.3
 > 8 disk   8196 34821 41.4 46136  9.0 32445  6.6 67120 89.1  94403  6.9
210.4  1.8
 > 
 > 46MB/sec writes, each disk individually can do better, but I guess
 > keeping 8 disks in sync is hurting performance. The 94MB/sec writes is
 > interesting. One the one hand, that''s greater then 1
disk''s worth, so
 > I''m getting striping performance out of a mirror GO ZFS. On the
other,
 > if I can get striping performance from mirrored reads, why is it only
 > 94MB/sec? Seemingly it''s not cpu bound. 
 > 
 > 
 > Now for the important test, raid-z
 > 
 >               -------Sequential Output-------- ---Sequential Input--
--Random--
 >               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
 > raidz      MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec
%CPU
 > 8 disk   8196 61785 70.9 142797 29.3 89342 19.9 64197 85.7 320554 32.6
131.3  1.0
 > 8 disk   8196 62869 72.4 131801 26.7 90692 20.7 63986 85.7 306152 33.4
127.3  1.0
 > 8 disk   8196 63103 72.9 128164 25.9 86175 19.4 64126 85.7 320410 32.7
124.5  0.9
 > 7 disk   8196 51103 58.8  93815 19.1 74093 16.1 64705 86.5 331865 32.8
124.9  1.0
 > 7 disk   8196 49446 56.8  93946 18.7 73092 15.8 64708 86.7 331458 32.7
127.1  1.0
 > 7 disk   8196 49831 57.1  81305 16.2 78101 16.9 64698 86.4 331577 32.7
132.4  1.0
 > 6 disk   8196 62360 72.3 157280 33.4 99511 21.9 65360 87.3 288159 27.1
132.7  0.9
 > 6 disk   8196 63291 72.8 152598 29.1 97085 21.4 65546 87.2 292923 26.7
133.4  0.8
 > 4 disk   8196 57965 67.9 123268 27.6 78712 17.1 66635 89.3 189482 15.9
134.1  0.9
 > 
 > I''m getting distinctly non-linear scaling here.
 > 
 > Writes: 4 disks gives me 123MB/sec. Raid0 was giving me 270/8
 > =33Mb/sec with cpu to spare (roughly half on what each individual disk
 > should be capable of). Here I''m getting 123/4= 30Mb/sec, or
should
 > that be 123/3= 41Mb/sec? 
 > Using 30 as a basline, I''d be expecting to see twice that with 8
disks (240ish?). What I end up with is ~135, Clearly not good scaling at all.
 > The really interesting numbers happen at 7 disks - it''s slower
then
 > with 4, in all tests. 
 > I ran it 3x to be sure.
 > Note this was a native 7 disk raid-z, it wasn''t 8 running in
degraded
 > mode with 7.  Something is really wrong with my write performance here
across the board.
 > 
 > Reads: 4 disks gives me 190MB/sec. WOAH! I''m very happy with
that. 8
 > disks should scale to 380 then, Well 320 isn''t all that far off -
no
 > biggie. 
 > Looking at the 6 disk raidz is interesting though, 290MB/sec. The
 > disks are good for 60+MB/sec individually. 290 is 48/disk - note also
 > that this is better then my raid0 performance?! 
 > Adding another 2 disks to my raidz gives me a mere 30Mb/sec extra
 > performance? Something is going very wrong here too. 
 > 
 > The 7 disk raidz read test is about what I''d expect (330/7=
47/disk), but it shows that the 8 disk is actually going backwards.
 > 
 > hmm...
 > 
 > 
 > I understand that going for an 8 disk wide raidz isn''t optimal in
terms of redundancy and IOPS/sec - but my workload shouldn''t involve
large amounts of sustained random IO, so I''m happy to take the loss in
favour of absolute capacity.
 > My issue here is the scaling on sequential block transfers, not optimal
design.
 > 
 > All three raid levels have had unexpected results, and I''ll
really apprectiate some suggestions on how I can troubleshoot this. I know how
to run iostat while bonnie is running, but that''s about it.
Incidentally, iostat is telling me that the disks are at best on hitting around
70% B. With the 8 disk tests, it was often below 50%....
 > 
 > Is my issue perhaps with the sata card that I''m using? Maybe
it''s just not able to handle that much throughput, despite being
advertised to do so. With Raid0 (aka dynamic stripes), I know that each disk can
read at 60-70Mb/sec. Why am I not getting 65*8 (500MB/sec+) performance. Maybe
it''s the marvell driver at fault here?
 > 
 > My thinking is that I need to get raid0 performing as expected before
looking at raidz, but I''m afraid I really don''t know where to
begin.
 > 
 > All thoughts & suggestions welcome. I''m not using the disks
yet, so I can blow the zpool away as needed.
 > 
 > Many thanks,
 > Jonathan Wheeler
 >  
 >  
 > This message posted from opensolaris.org
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Richard Elling

2006-Jul-17 16:43 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

Dana H. Myers wrote:> Jonathan Wheeler wrote:
>>               -------Sequential Output-------- ---Sequential Input--
--Random--
>>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
>> mirror   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU 
/sec %CPU
>> 8 disk   8196 33285 38.6 46033  9.9 33077  6.8 67934 90.4  93445  7.7
230.5  1.3
>> 8 disk   8196 34821 41.4 46136  9.0 32445  6.6 67120 89.1  94403  6.9
210.4  1.8
>>
>> 46MB/sec writes, each disk individually can do better, but I guess
keeping 8 disks in sync is hurting performance. The 94MB/sec writes is
interesting. One the one hand, that''s greater then 1 disk''s
worth, so I''m getting striping performance out of a mirror GO ZFS. On
the other, if I can get striping performance from mirrored reads, why is it only
94MB/sec? Seemingly it''s not cpu bound.
> 
> I expect a mirror to perform about the same as a single disk for writes,
and about
> the same as two disks for reads, which seems to be the case here.  Someone
from
> the ZFS team can correct me, but I tend to believe that reads from a mirror
are
> scheduled in pairs; it doesn''t help the read performance to have 6
more copies of
> the same data available.
Is this an 8-way mirror, or a 4x2 RAID-1+0?  For the former, I agree with Dana.
For the latter, you should get more available space and better performance.
8-way mirror:
	zpool create blah mirror c1d0 c1d1 c1d2 c1d3 c1d4 c1d5 c1d6 c1d7
4x2-way mirror:
	zpool create blag mirror c1d0 c1d1 mirror c1d2 c1d3 mirror c1d4 c1d5 mirror
c1d6 c1d7

  -- richard

Al Hopper

2006-Jul-17 17:00 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

On Mon, 17 Jul 2006, Roch wrote:
>
> Sorry to plug my own blog but have you had a look at these ?
>
> 	http://blogs.sun.com/roller/page/roch?entry=when_to_and_not_to (raidz)
> 	http://blogs.sun.com/roller/page/roch?entry=the_dynamics_of_zfs
>
> Also, my thinking is that raid-z is probably more friendly
> when the config contains (power-of-2 + 1) disks (or + 2 for
> raid-z2).
+1

I think that 5 disks for a raidz is the sweet spot IMHO.  But ... YMMV etc.etc.

FWIW: here''s a datapoint from a dirty raidz system with 8Gb of RAM
& 5 *
300Gb SATA disks:

Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfs0            16G 88937  99 195973  47 95536  29 75279  95 228022  27 433.9  
1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 31812  99 +++++ +++ +++++ +++ 28761  99 +++++ +++ +++++ +++
zfs0,16G,88937,99,195973,47,95536,29,75279,95,228022,27,433.9,1,16,31812,99,+++++,+++,+++++,+++,28761,99,+++++,+++,+++++,+++

I''m *very* pleased with the current release of ZFS.  That being said,
ZFS
can be frustrating at times.  Occasionally it''ll issue in excess of 1k
I/O
ops a Second (IOPS) and you''ll say "holy snit, look at..." -
and then
there are times you wonder why it won''t issue more that ~250 IOPS. 
But,
for a Rev 1 filesystem, with the technical complexity of ZFS, this level
of performance is excellent IMHO and I expect that all kinds of
improvements will continue to be made on the code over time.

Jonathan - I expect the answer to your performance expectations is that
ZFS is-what-it-is at the moment.  A suggestion is to split your 8 drives
into a 5 disk raidz pool and a 2 disk mirror with one spare drive
remaining.  Of course this is from my ZFS experience and for my intended
usage and may not apply to your intended application(s).

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
                OpenSolaris Governing Board (OGB) Member - Feb 2006

James Dickens

2006-Jul-17 19:03 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

On 7/17/06, Jonathan Wheeler <griffous at griffous.net>
wrote:> Hi All,
>
> I''ve just built an 8 disk zfs storage box, and I''m in the
testing phase before I put it into production. I''ve run into some
unusual results, and I was hoping the community could offer some suggestions.
I''ve bascially made the switch to Solaris on the promises of ZFS alone
(yes I''m that excited about it!), so naturally I''m looking
forward to some great performance - but it appears I''m going to need
some help finding all of it.
>
> I was having even lower numbers with filebench, so I decided to dial back
to a really simple app for testing - bonnie.
>
> The system is an nevada_41 EM64T 3ghz xeon. 1GB ram, with 8x seagate sata
II 300GB disks, Supermicro SAT2-MV8 8 port sata controller, running at/on a
133Mhz 64pci-x bus.
> The bottle neck here, by my thinkng, should be the disks themselves.
> It''s not the disk interfaces (''300MB''), the disk
bus (300MB EACH), the pci-x bus (1.1GB), and I''d hope a 64-bit 3Ghz cpu
would be sufficent.
>
> Tests were run on a fresh clean zpool, on an idle system. Rogue results
were dropped, and as you can see below, all tests were run more then once. 8GB
should be far more then the 1GB of RAM that the system has, eliminating caching
issues.
>
> If I''ve still managed to overlook something in my testing setup,
please let me know - I sure did try!
>
> Sorry about the formatting - this is bound to end up ugly
>
> Bonnie
>               -------Sequential Output-------- ---Sequential Input--
--Random--
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
> raid0    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec
%CPU
> 8 disk   8196 78636 93.0 261804 64.2 125585 25.6 72160 95.3 246172 19.1
286.0  2.0
> 8 disk   8196 79452 93.9 286292 70.2 129163 26.0 72422 95.5 243628 18.9
302.9  2.1
>
> so ~270MB/sec writes - awesome! 240MB/sec reads though - why would this be
LOWER then writes??
>
>               -------Sequential Output-------- ---Sequential Input--
--Random--
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
> mirror   MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec
%CPU
> 8 disk   8196 33285 38.6 46033  9.9 33077  6.8 67934 90.4  93445  7.7 230.5
1.3
> 8 disk   8196 34821 41.4 46136  9.0 32445  6.6 67120 89.1  94403  6.9 210.4
1.8
>
> 46MB/sec writes, each disk individually can do better, but I guess keeping
8 disks in sync is hurting performance. The 94MB/sec writes is interesting. One
the one hand, that''s greater then 1 disk''s worth, so
I''m getting striping performance out of a mirror GO ZFS. On the other,
if I can get striping performance from mirrored reads, why is it only 94MB/sec?
Seemingly it''s not cpu bound.
>
>
> Now for the important test, raid-z
>
>               -------Sequential Output-------- ---Sequential Input--
--Random--
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
--Seeks---
> raidz      MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec
%CPU
> 8 disk   8196 61785 70.9 142797 29.3 89342 19.9 64197 85.7 320554 32.6
131.3  1.0
> 8 disk   8196 62869 72.4 131801 26.7 90692 20.7 63986 85.7 306152 33.4
127.3  1.0
> 8 disk   8196 63103 72.9 128164 25.9 86175 19.4 64126 85.7 320410 32.7
124.5  0.9
> 7 disk   8196 51103 58.8  93815 19.1 74093 16.1 64705 86.5 331865 32.8
124.9  1.0
> 7 disk   8196 49446 56.8  93946 18.7 73092 15.8 64708 86.7 331458 32.7
127.1  1.0
> 7 disk   8196 49831 57.1  81305 16.2 78101 16.9 64698 86.4 331577 32.7
132.4  1.0
> 6 disk   8196 62360 72.3 157280 33.4 99511 21.9 65360 87.3 288159 27.1
132.7  0.9
> 6 disk   8196 63291 72.8 152598 29.1 97085 21.4 65546 87.2 292923 26.7
133.4  0.8
> 4 disk   8196 57965 67.9 123268 27.6 78712 17.1 66635 89.3 189482 15.9
134.1  0.9
>
> I''m getting distinctly non-linear scaling here.
> Writes: 4 disks gives me 123MB/sec. Raid0 was giving me 270/8 =33Mb/sec
with cpu to spare (roughly half on what each individual disk should be capable
of). Here I''m getting 123/4= 30Mb/sec, or should that be 123/3=
41Mb/sec?
> Using 30 as a basline, I''d be expecting to see twice that with 8
disks (240ish?). What I end up with is ~135, Clearly not good scaling at all.
> The really interesting numbers happen at 7 disks - it''s slower
then with 4, in all tests.
> I ran it 3x to be sure.
> Note this was a native 7 disk raid-z, it wasn''t 8 running in
degraded mode with 7.
> Something is really wrong with my write performance here across the board.
>
> Reads: 4 disks gives me 190MB/sec. WOAH! I''m very happy with that.
8 disks should scale to 380 then, Well 320 isn''t all that far off - no
biggie.
> Looking at the 6 disk raidz is interesting though, 290MB/sec. The disks are
good for 60+MB/sec individually. 290 is 48/disk - note also that this is better
then my raid0 performance?!
> Adding another 2 disks to my raidz gives me a mere 30Mb/sec extra
performance? Something is going very wrong here too.
>I''m not an expert, but would be great if you could run at least one
more test.

can you try  2x 4disks in a raidz pool to see if the system does scale
to 380MB/s or does the cpu does get in the way.

If another controller card availible it would be interesting to see
what effect if any there is to spliting the 8 drives across 2
controllers 4 drives per controller, to see if you get any performance
change.

I wonder if more cpus/cores would help this test, well theoretically
the single cpu is fast enough but when you have checksuming, creating
parity, reading and writing, and the benchmark you may get some
strange interaction.

You may want to try again in a few weeks, I heard that a change went
into the kernel that makes  SATA access more effiecient.


James Dickens
uadmin.blogspot.com


> The 7 disk raidz read test is about what I''d expect (330/7=
47/disk), but it shows that the 8 disk is actually going backwards.
>
> hmm...
>
>
> I understand that going for an 8 disk wide raidz isn''t optimal in
terms of redundancy and IOPS/sec - but my workload shouldn''t involve
large amounts of sustained random IO, so I''m happy to take the loss in
favour of absolute capacity.
> My issue here is the scaling on sequential block transfers, not optimal
design.
>
> All three raid levels have had unexpected results, and I''ll really
apprectiate some suggestions on how I can troubleshoot this. I know how to run
iostat while bonnie is running, but that''s about it. Incidentally,
iostat is telling me that the disks are at best on hitting around 70% B. With
the 8 disk tests, it was often below 50%....
>
> Is my issue perhaps with the sata card that I''m using? Maybe
it''s just not able to handle that much throughput, despite being
advertised to do so. With Raid0 (aka dynamic stripes), I know that each disk can
read at 60-70Mb/sec. Why am I not getting 65*8 (500MB/sec+) performance. Maybe
it''s the marvell driver at fault here?
>
> My thinking is that I need to get raid0 performing as expected before
looking at raidz, but I''m afraid I really don''t know where to
begin.
>
> All thoughts & suggestions welcome. I''m not using the disks
yet, so I can blow the zpool away as needed.
>
> Many thanks,
> Jonathan Wheeler
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Jonathan Wheeler

2006-Jul-18 08:56 UTC

head link

[zfs-discuss] Re: ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

Richard Elling wrote:
> Dana H. Myers wrote:
>
>> Jonathan Wheeler wrote:
>><snip>
>>> On the one hand, that''s greater then 1 disk''s
worth, so I''m getting striping performance out of a mirror GO ZFS. On
the other, if I can get striping performance from mirrored reads, why is it only
94MB/sec? Seemingly it''s not cpu bound.
>>
>>
>>
>> I expect a mirror to perform about the same as a single disk for
writes, and about
>> the same as two disks for reads, which seems to be the case here. 
Someone from
>> the ZFS team can correct me, but I tend to believe that reads from a
mirror are
>> scheduled in pairs; it doesn''t help the read performance to
have 6 more copies of
>> the same data available.

Makes sense, thanks Dana.
> Is this an 8-way mirror, or a 4x2 RAID-1+0?  For the former, I agree with
Dana.
Yup, a full 8 way mirror.
> For the latter, you should get more available space and better performance.
> 8-way mirror:
>     zpool create blah mirror c1d0 c1d1 c1d2 c1d3 c1d4 c1d5 c1d6 c1d7
> 4x2-way mirror:
>     zpool create blag mirror c1d0 c1d1 mirror c1d2 c1d3 mirror c1d4 c1d5
mirror c1d6 c1d7
>
I agree, the it would be a win both ways. Though in my own defence I never
intended to run a full 8 way mirror for actual use - it was just a fun test to
see what would happen, or that the results might help point towards a bottleneck
that wasn''t so obvious with the other raid levels.

Thanks,
Jonathan Wheeler
 
 
This message posted from opensolaris.org

Jonathan Wheeler

2006-Jul-18 13:19 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

> On Mon, 17 Jul 2006, Roch wrote: >
 > >
 > > Sorry to plug my own blog but have you had a look
 > at these ?
 > >
 > >
 > 	http://blogs.sun.com/roller/page/roch?entry=when_to_a
 > nd_not_to (raidz)
 > >
 > 	http://blogs.sun.com/roller/page/roch?entry=the_dynam
 > ics_of_zfs
 > >
 > > Also, my thinking is that raid-z is probably more
 > friendly
 > > when the config contains (power-of-2 + 1) disks (or
 > + 2 for
 > > raid-z2).
 >
Yes I did, and please, plug away!! These are awesome blog entries, and 
I''ve read both of them several times. You rule! Really.
I wish I could understood a bit more of your second one, it''s a bit
over
my head I''m afraid.

I understand that 8 disks is not optimal for a raidz set, especially for 
random inputs, and your blog entry is the reason for my comment to that 
effect near the bottom of my first post.

My lastest raid 50 results were much more healthy, but I don''t know
that
I''m ready to sacrifice 300GB of storage to that slight improve - 
especially as zfs can''t grow the individual stripes (yet...)

 >
 > I think that 5 disks for a raidz is the sweet spot
 > IMHO.  But ... YMMV etc.etc.
 >
 > FWIW: here''s a datapoint from a dirty raidz system
 > with 8Gb of RAM & 5 *
 > 300Gb SATA disks:
 >
 > Version  1.03       ------Sequential Output------
 > --Sequential Input- --Random-
 > -Per Chr- --Block-- -Rewrite-
 > -Per Chr- --Block-- --Seeks--
 > Machine        Size K/sec %CP K/sec %CP K/sec %CP
 > K/sec %CP K/sec %CP  /sec %CP
 > zfs0            16G 88937  99 195973  47 95536  29
 > 75279  95 228022  27 433.9   1
 > ------Sequential Create------
 >  --------Random Create--------
 > -Create-- --Read--- -Delete--
 >  -Create-- --Read--- -Delete--
 > files  /sec %CP  /sec %CP  /sec %CP
 >   /sec %CP  /sec %CP  /sec %CP
 > 16 31812  99 +++++ +++ +++++ +++ 28761
 >   99 +++++ +++ +++++ +++
 > s0,16G,88937,99,195973,47,95536,29,75279,95,228022,27,
 > 433.9,1,16,31812,99,+++++,+++,+++++,+++,28761,99,+++++
 > ,+++,+++++,+++

Here is my version with 5 disks in a single raidz:

               -------Sequential Output-------- ---Sequential Input-- 
--Random--
               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- 
--Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU 
/sec %CPU
5 disks  16384 62466 72.5 133768 28.0 97698 21.7 66504 88.1 241481 20.7 
118.2  1.4

Ouch, your one is much better! Can you tell me more about your setup?

 > I''m *very* pleased with the current release of ZFS.
 >  That being said, ZFS
 > an be frustrating at times.  Occasionally it''ll issue
 > in excess of 1k I/O
 > ops a Second (IOPS) and you''ll say "holy snit, look
 > at..." - and then
 > there are times you wonder why it won''t issue more
 > that ~250 IOPS.  But,
 > for a Rev 1 filesystem, with the technical complexity
 > of ZFS, this level
 > of performance is excellent IMHO and I expect that
 > all kinds of
 > improvements will continue to be made on the code
 > over time.

I don''t really have a point of comparison to know how well my hardware 
should be performing in the real world, just a gut feeling that it 
should be doing better, and some rather odd scaling issues.
Please don''t take this as zfs bashing. I still can''t stop
telling
everyone I know about how I can create a 2TB raid in 3 seconds - I think 
ZFS is wicked cool!

This thread is two fold, 1) I''m hoping to learn more about the zfs
&
solaris performance tuning by digging on in and investigating. 2) I have 
some notion of hopefully being helpful by providing developers with some 
real world data that might help in improving the code. I''m more then 
happy to to any testing that anyone can throw at me. I''ve already had 
one email from one person asking me to run their dtrace script with 
benchmarking and email back the results.
This is great. I can''t code, but if I can help give back in any way
here
- hurrah!

 > Jonathan - I expect the answer to your performance
 > expectations is that
 > ZFS is-what-it-is at the moment.

Along those lines, I''ll upgrade to the lastest nevada as soon as my 
connection finishes it. 5 CDs is very non-trivial down in this part of 
the world sadly.


 > Regards,
 >
 > Al Hopper  Logical Approach Inc, Plano, TX.

Thanks for the reply Al,
Jonathan Wheeler

Jonathan Wheeler

2006-Jul-18 13:20 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

> On 7/17/06, Jonathan Wheeler <griffous at griffous.net> > wrote:

<SNIP>

 > > Reads: 4 disks gives me 190MB/sec. WOAH! I''m very
 > happy with that. 8 disks should scale to 380 then,
 > Well 320 isn''t all that far off - no biggie.
 > > Looking at the 6 disk raidz is interesting though,
 > 290MB/sec. The disks are good for 60+MB/sec
 > individually. 290 is 48/disk - note also that this is
 > better then my raid0 performance?!
 > > Adding another 2 disks to my raidz gives me a mere
 > 30Mb/sec extra performance? Something is going very
 > wrong here too.
 > >
 > I''m not an expert, but would be great if you could
 > run at least one more test.
 >
 > can you try  2x 4disks in a raidz pool to see if the
 > system does scale
 > to 380MB/s or does the cpu does get in the way.
Sure thing!

Like this?
# zpool create Z raidz c0t0d0 c0t1d0 c0t2d0 c0t3d0 raidz c0t4d0 c0t5d0 
c0t6d0 c0t7d0
# zpool status
   pool: Z
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         Z           ONLINE       0     0     0
           raidz     ONLINE       0     0     0
             c0t0d0  ONLINE       0     0     0
             c0t1d0  ONLINE       0     0     0
             c0t2d0  ONLINE       0     0     0
             c0t3d0  ONLINE       0     0     0
           raidz     ONLINE       0     0     0
             c0t4d0  ONLINE       0     0     0
             c0t5d0  ONLINE       0     0     0
             c0t6d0  ONLINE       0     0     0
             c0t7d0  ONLINE       0     0     0

errors: No known data errors

Which gave me:
               -------Sequential Output-------- ---Sequential Input-- 
--Random--
               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- 
--Seeks---
Machine    MB   K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU 
/sec %CPU
raid50   8196 66551 76.0 107030 21.8 112764 25.9 67629 90.0 327566 35.4 
195.6  1.4
raid50   8196 64689 74.6 173965 33.4 107816 23.8 67586 89.9 347390 34.6 
204.6  1.3
raid50   8196 64235 74.3 159652 32.0 105808 23.9 67679 90.1 114283 10.0 
201.0  1.8
raid50   8196 64348 74.5 171047 34.0 104658 23.3 67422 89.5 353125 34.2 
204.2  1.8
raid50   8196 66034 75.6 170279 33.4 109865 24.2 67413 89.6 345114 34.8 
197.7  1.3

As you can see I ran it 5 times. ZFS seemed to be taking a bit of a nap 
(throttling?) on runs 1 & 3, but the trend is definitely better then a 
single raid-z group, across the board.

My understanding of dymanic stripe sizes still needs work, so please 
correct me if I have this wrong;
It''s interesting to me that even though there are only 6 disks to use 
for writes (rather then 7), and the cpu is having to do twice the parity 
calculations, the performance is still actually _better_ overall.
The sequential reads is higher, and the random seeks is not surprisingly 
better.
James, does this rule out the cpu for you?

These results are probably better interpretted by the experts, so 
please, speak up :). I read this as showing that my disk subsystem did 
have more in it then we were seeing from a single Raidz.

 >
 > If another controller card availible it would be
 > interesting to see
 > what effect if any there is to spliting the 8 drives
 > across 2
 > controllers 4 drives per controller, to see if you
 > get any performance
 > change.

Yes, I would love to test this. I''ve been suspicious that this might be
a controller issue, though I was somewhat hoping that there would be 
some way that the OS could tell me this. I don''t actually have a second
card, to test this properly :(
I do however have 2 motherboard sata ports. It wouldn''t be a great test
(6 on one, 2 on the other), but it may give some interesting results. 
I''ll try this tomorrow.

 > I wonder if more cpus/cores would help this test,
 > well theoretically
 > the single cpu is fast enough but when you have
 > checksuming, creating
 > parity, reading and writing, and the benchmark you
 > may get some
 > strange interaction.

I don''t have another cpu sadly. I could do some test with checksuming 
turned off though if you''d like to see those results. If it turns out 
that this is sapping a large amount of cpu, it''s a penalty I''m
quite
happy to pay given the benefits. I just want to know where the 
bottleneck lies.

 > You may want to try again in a few weeks, I heard
 > that a change went
 > into the kernel that makes  SATA access more
 > effiecient.
 >
Well that would be great. I heard something about NCQ not being 
implemented yet, perhaps this is what you are referring to?

 > James Dickens
 > uadmin.blogspot.com
 >
Jonathan

Luke Lonergan

2006-Jul-18 16:45 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

The prefetch and I/O scheduling of nv41 were responsible for some quirky
performance.  First time read performance might be good, then subsequent
reads might be very poor.

With a very recent update to the zfs module that improves I/O scheduling and
prefetching, I get the following bonnie++ 1.03a results with a 36 drive
RAID10, Solaris 10 U2 on an X4500 with 500GB Hitachi drives (zfs
checksumming is off):

Version  1.03       ------Sequential Output------    --Sequential Input-
--Random-
                    -Per Chr-  --Block--  -Rewrite-  -Per Chr-  --Block--
--Seeks--
Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
/sec %CP
thumperdw-i-1   32G 120453  99 467814  98 290391  58 109371  99 993344  94
1801   4
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP
                 16 +++++ +++ +++++ +++ +++++ +++ 30850  99 +++++ +++ +++++
+++

Bumping up the number of concurrent processes to 2, we get about 1.5x speed
reads of RAID10 with a concurrent workload (you have to add the rates
together): 

Version  1.03       ------Sequential Output------   --Sequential Input-
--Random-
                    -Per Chr- --Block--  -Rewrite-  -Per Chr-  --Block--
--Seeks--
Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
/sec %CP
thumperdw-i-1   32G 111441  95 212536  54 171798  51 106184  98 719472  88
1233   2
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP
                 16 26085  90 +++++ +++  5700  98 21448  97 +++++ +++  4381
97

Version  1.03       ------Sequential Output------   --Sequential Input-
--Random-
                    -Per Chr-  --Block--  -Rewrite-  -Per Chr-  --Block--
--Seeks--
Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
/sec %CP
thumperdw-i-1   32G 116355  99 212509  54 171647  50 106112  98 715030  87
1274   3
                    ------Sequential Create------ --------Random
Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP
                 16 26082  99 +++++ +++  5588  98 21399  88 +++++ +++  4272
97

So that?s 2500 seeks per second, 1440MB/s sequential block read, 212MB/s per
character sequential read.

- Luke


On 7/18/06 6:19 AM, "Jonathan Wheeler" <griffous at
griffous.net> wrote:

>> Version  1.03       ------Sequential Output------
>> --Sequential Input- --Random-
>> -Per Chr- --Block-- -Rewrite-
>> -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP
>> K/sec %CP K/sec %CP  /sec %CP
>> zfs0            16G 88937  99 195973  47 95536  29
>> 75279  95 228022  27 433.9   1
>> ------Sequential Create------
>>  --------Random Create--------
>> -Create-- --Read--- -Delete--
>>  -Create-- --Read--- -Delete--
>> files  /sec %CP  /sec %CP  /sec %CP
>>   /sec %CP  /sec %CP  /sec %CP
>> 16 31812  99 +++++ +++ +++++ +++ 28761
>>   99 +++++ +++ +++++ +++
>> s0,16G,88937,99,195973,47,95536,29,75279,95,228022,27,
>> 433.9,1,16,31812,99,+++++,+++,+++++,+++,28761,99,+++++
>> ,+++,+++++,+++
> 
> Here is my version with 5 disks in a single raidz:
> 
>                -------Sequential Output-------- ---Sequential Input--
> --Random--
>                -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
> --Seeks---
> Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
> /sec %CPU
> 5 disks  16384 62466 72.5 133768 28.0 97698 21.7 66504 88.1 241481 20.7
> 118.2  1.4
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060718/f01859e5/attachment.html>

Eric Schrock

2006-Jul-18 17:16 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

Also note that the current prefetching (even in snv_41) still suffers
from some major systematic performance problems.  This should be fixed
by snv_45/s10u3, and is covered by the following bug:

6447377 ZFS prefetch is inconsistant

I''ll duplicate Mark''s evaluation here, since it
doesn''t show up on
bugs.opensolaris.org:

---------
The main problem here is that the dmu is not informing the zfetch code
about all IO''s.  The zfetch interface is only called when the dmu needs
to go to the ARC to resolve an IO request.  If the dmu finds that the
buffer is already cached (in the dmu) it does not bother to call zfetch.
So here''s what can happen:

1 - ARC cache gets loaded up with some portion of a file ''X''
2 - application initiates a sequential read on ''X''
3 - DMU reads first 10 blocks from the file via arc_read()
4 - dmu_zfetch() detects sequential read pattern and starts prefetching
5 - DMU finds blocks 11-15 already cached (does not tell zfetch)
6 - DMU issues read for block 16
7 - dmu_zfetch() sees a gap in the read pattern, and so assumes that we
	are doing a *strided read*, and changes its prefetch algorithm:
	prefetch 10, skip 5, prefetch 10, ... etc.

As the dmu finds other blocks in its cache, the zfetch algorithms can
become even more confused.
---------

With some additional fixes from Jeff, sequential read performance has
been vastly improved.  These fixes are undergoing final testing as we
speak.

- Eric

On Tue, Jul 18, 2006 at 09:45:09AM -0700, Luke Lonergan
wrote:> The prefetch and I/O scheduling of nv41 were responsible for some quirky
> performance.  First time read performance might be good, then subsequent
> reads might be very poor.
> 
> With a very recent update to the zfs module that improves I/O scheduling
and
> prefetching, I get the following bonnie++ 1.03a results with a 36 drive
> RAID10, Solaris 10 U2 on an X4500 with 500GB Hitachi drives (zfs
> checksumming is off):
> 
> Version  1.03       ------Sequential Output------    --Sequential Input-
> --Random-
>                     -Per Chr-  --Block--  -Rewrite-  -Per Chr-  --Block--
> --Seeks--
> Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
> /sec %CP
> thumperdw-i-1   32G 120453  99 467814  98 290391  58 109371  99 993344  94
> 1801   4
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                  16 +++++ +++ +++++ +++ +++++ +++ 30850  99 +++++ +++ +++++
> +++
> 
> Bumping up the number of concurrent processes to 2, we get about 1.5x speed
> reads of RAID10 with a concurrent workload (you have to add the rates
> together): 
> 
> Version  1.03       ------Sequential Output------   --Sequential Input-
> --Random-
>                     -Per Chr- --Block--  -Rewrite-  -Per Chr-  --Block--
> --Seeks--
> Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
> /sec %CP
> thumperdw-i-1   32G 111441  95 212536  54 171798  51 106184  98 719472  88
> 1233   2
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                  16 26085  90 +++++ +++  5700  98 21448  97 +++++ +++  4381
> 97
> 
> Version  1.03       ------Sequential Output------   --Sequential Input-
> --Random-
>                     -Per Chr-  --Block--  -Rewrite-  -Per Chr-  --Block--
> --Seeks--
> Machine        Size K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP K/sec  %CP
> /sec %CP
> thumperdw-i-1   32G 116355  99 212509  54 171647  50 106112  98 715030  87
> 1274   3
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                  16 26082  99 +++++ +++  5588  98 21399  88 +++++ +++  4272
> 97
> 
> So that?s 2500 seeks per second, 1440MB/s sequential block read, 212MB/s
per
> character sequential read.
> 
> - Luke
> 
> 
> On 7/18/06 6:19 AM, "Jonathan Wheeler" <griffous at
griffous.net> wrote:
> 
> 
> >> Version  1.03       ------Sequential Output------
> >> --Sequential Input- --Random-
> >> -Per Chr- --Block-- -Rewrite-
> >> -Per Chr- --Block-- --Seeks--
> >> Machine        Size K/sec %CP K/sec %CP K/sec %CP
> >> K/sec %CP K/sec %CP  /sec %CP
> >> zfs0            16G 88937  99 195973  47 95536  29
> >> 75279  95 228022  27 433.9   1
> >> ------Sequential Create------
> >>  --------Random Create--------
> >> -Create-- --Read--- -Delete--
> >>  -Create-- --Read--- -Delete--
> >> files  /sec %CP  /sec %CP  /sec %CP
> >>   /sec %CP  /sec %CP  /sec %CP
> >> 16 31812  99 +++++ +++ +++++ +++ 28761
> >>   99 +++++ +++ +++++ +++
> >> s0,16G,88937,99,195973,47,95536,29,75279,95,228022,27,
> >> 433.9,1,16,31812,99,+++++,+++,+++++,+++,28761,99,+++++
> >> ,+++,+++++,+++
> > 
> > Here is my version with 5 disks in a single raidz:
> > 
> >                -------Sequential Output-------- ---Sequential Input--
> > --Random--
> >                -Per Char- --Block--- -Rewrite-- -Per Char- --Block---
> > --Seeks---
> > Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
> > /sec %CPU
> > 5 disks  16384 62466 72.5 133768 28.0 97698 21.7 66504 88.1 241481
20.7
> > 118.2  1.4
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Tao Chen

2006-Jul-19 23:41 UTC

head link

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

On 7/17/06, Jonathan Wheeler <griffous at griffous.net>
wrote:>
> Hi All,
>
> I''ve just built an 8 disk zfs storage box, and I''m in the
testing phase
> before I put it into production. I''ve run into some unusual
results, and I
> was hoping the community could offer some suggestions. I''ve
bascially made
> the switch to Solaris on the promises of ZFS alone (yes I''m that
excited
> about it!), so naturally I''m looking forward to some great
performance - but
> it appears I''m going to need some help finding all of it.
>
>One major concern Jonathan has is the 7-raidz write performance.
(I see no big surprise in ''read'' results.)

"The really interesting numbers happen at 7 disks -
 it''s slower then with 4, in all tests."

I randomly picked 3 results from his several runs:

              -Per Char- --Block---    -Rewrite--
          MB  K/sec %CPU   K/sec %CPU  K/sec %CPU
        ====  ========== ============  =========4-disk  8196  57965 67.9  123268
27.6  78712 17.1
7-disk  8196  49454 57.1   92149 20.1  73013 16.0
8-disk  8196  61345 70.7  139259 28.5  89545 20.8

I looked at the corresponding dtrace data for
7 and 8-raidz cases.
(Should have also asked for 4-raidz data.
 Jonathan, you can still send 4-raidz data to me offline.)

In 7-raidz, each disk had writes in two sizes:
214-block or 85-block, equally.

  DEVICE    BLKs     COUNT
  --------  ----  --------
  sd1         85     27855
             214     27882
  sd2         85     27854
             214     27868
  sd3         85     27849
             214     27884
  ...


In 8-raidz,
sd1,3,5,7 had either 220 or 221-block writes, equally.
sd2,4,6,8 had 100% of 146-block writes.

  DEVICE    BLKs     COUNT
  --------  ----  --------
  sd1        220     16325
             221     16338
  sd2        146     49001
  sd3        220     16335
             221     16333
  sd4        146     49005
  sd5        220     16340
             221     16324
  sd6        146     49001
  sd7        220     16332
             221     16333
  sd8        146     49009


In terms of average write response time,
in 7-raidz

  DEVICE     WRITE AVG.ms
  -------  ------- ------
  sd1        63990  54.03
  sd2        64000  53.65
  sd3        63898  55.48
  sd4        64190  54.14
  sd5        64091  54.81
  sd6        63967  57.83
  sd7        64092  54.19

in 8-raidz

  DEVICE     WRITE AVG.ms
  -------  ------- ------
  sd1        42276   6.64
  sd2        58467  19.66
  sd3        42287   6.24
  sd4        55198  20.01
  sd5        42285   6.64
  sd6        58409  22.90
  sd7        42235   6.88
  sd8        54967  24.46

At bdev level, 8-raidz shows much better turnaround
time than 7-raidz, while disk 1,3,5,7 (larger writes) are
better than 2,4,6,8 (smaller writes).

So 8-raidz wins by larger writes and much better response
time for each write, but why these two differences?
and why the disparity between odd- and even-number disks
within 8-raidz?

Tao
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060719/603d8281/attachment.html>

Maybe Matching Threads

Search for more reasonably related threads

zfs discuss - Jul 2006 - ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] Re: ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

[zfs-discuss] ZFS bechmarks w/8 disk raid - Quirky results, any thoughts?

Maybe Matching Threads