thr3ads.net - zfs discuss - [zfs-discuss] Asymmetric zpool load [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Carsten Aulbert

2008-Dec-02 19:16 UTC

[zfs-discuss] Asymmetric zpool load

Hi all,

We are running pretty large vdevs since the initial testing showed that
our setup was not too much off the optimum. However, under real world
load we do see quite some weird behaviour:

The system itself is a X4500 with 500 GB drives and right now the system
seems to be under heavy load, e.g. ls takes minutes to return on only a
few hundred entries, top shows 10% kernel, rest idle.

zpool ioststat -v atlashome 60 shows (not the first output):

              capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
atlashome   2.11T  18.8T  2.29K     36  71.7M   138K
  raidz2     466G  6.36T    493     11  14.9M  34.1K
    c0t0d0      -      -     48      5  1.81M  3.52K
    c1t0d0      -      -     48      5  1.81M  3.46K
    c4t0d0      -      -     48      5  1.81M  3.27K
    c6t0d0      -      -     48      5  1.81M  3.40K
    c7t0d0      -      -     47      5  1.81M  3.40K
    c0t1d0      -      -     47      5  1.81M  3.20K
    c1t1d0      -      -     47      6  1.81M  3.59K
    c4t1d0      -      -     47      6  1.81M  3.53K
    c5t1d0      -      -     47      5  1.81M  3.33K
    c6t1d0      -      -     48      6  1.81M  3.67K
    c7t1d0      -      -     48      6  1.81M  3.66K
    c0t2d0      -      -     48      5  1.82M  3.42K
    c1t2d0      -      -     48      6  1.81M  3.56K
    c4t2d0      -      -     48      6  1.81M  3.54K
    c5t2d0      -      -     48      5  1.81M  3.41K
  raidz2     732G  6.10T    800     12  24.6M  52.3K
    c6t2d0      -      -    139      5  7.52M  4.54K
    c7t2d0      -      -    139      5  7.52M  4.81K
    c0t3d0      -      -    140      5  7.52M  4.98K
    c1t3d0      -      -    139      5  7.51M  4.47K
    c4t3d0      -      -    139      5  7.51M  4.82K
    c5t3d0      -      -    139      5  7.51M  4.99K
    c6t3d0      -      -    139      5  7.52M  4.44K
    c7t3d0      -      -    139      5  7.52M  4.78K
    c0t4d0      -      -    139      5  7.52M  4.97K
    c1t4d0      -      -    139      5  7.51M  4.60K
    c4t4d0      -      -    139      5  7.51M  4.86K
    c6t4d0      -      -    139      5  7.51M  4.99K
    c7t4d0      -      -    139      5  7.51M  4.52K
    c0t5d0      -      -    139      5  7.51M  4.78K
    c1t5d0      -      -    138      5  7.51M  4.94K
  raidz2     960G  6.31T  1.02K     12  32.2M  52.0K
    c4t5d0      -      -    178      5  9.29M  4.79K
    c5t5d0      -      -    178      5  9.28M  4.64K
    c6t5d0      -      -    179      5  9.29M  4.44K
    c7t5d0      -      -    178      4  9.26M  4.26K
    c0t6d0      -      -    178      5  9.28M  4.78K
    c1t6d0      -      -    178      5  9.20M  4.58K
    c4t6d0      -      -    178      5  9.26M  4.25K
    c5t6d0      -      -    177      4  9.21M  4.18K
    c6t6d0      -      -    178      5  9.29M  4.69K
    c7t6d0      -      -    177      5  9.26M  4.61K
    c0t7d0      -      -    177      5  9.29M  4.34K
    c1t7d0      -      -    177      5  9.24M  4.28K
    c4t7d0      -      -    177      5  9.29M  4.78K
    c5t7d0      -      -    177      5  9.27M  4.75K
    c6t7d0      -      -    177      5  9.29M  4.34K
    c7t7d0      -      -    177      5  9.27M  4.28K
----------  -----  -----  -----  -----  -----  -----

Questions:
(a) Why the first vdev does not get an equal share of the load
(b) Why is a large raidz2 so bad? When I use a standard Linux box with
hardware raid6 over 16 disks I usually get more bandwidth and at least
about the same small file performance
(c) Would the use of several smaller vdev would help much? And which
layout would be a good compromise for getting space as well as
performance and reliability? 46 disks have so few prime factors

Thanks a lot

Carsten

Bob Friesenhahn

2008-Dec-02 19:57 UTC

head link

[zfs-discuss] Asymmetric zpool load

On Tue, 2 Dec 2008, Carsten Aulbert wrote:>
> Questions:
> (a) Why the first vdev does not get an equal share of the load
You may have one or more "slow" disk drives which slow down the whole 
vdev due to long wait times.  If you can identify those slow disk 
drives and replace them, then overall performance is likely to 
improve.

The problem is that under severe load, the vdev with the highest 
backlog will be used the least.  One or more slow disks in the vdev 
will slow down the whole vdev.  It takes only one slow disk to slow 
down the whole vdev.
> (b) Why is a large raidz2 so bad? When I use a standard Linux box with
> hardware raid6 over 16 disks I usually get more bandwidth and at least
> about the same small file performance
ZFS commits the writes to all involved disks in a raidz2 before 
proceeding with the next write.  With so many disks, you are asking 
for quite a lot of fortuitous luck in that everything must be working 
optimally.  Compounding the problem is that I understand that when the 
stripe width exceeds the number of segmented blocks from the data to 
be written (ZFS is only willing to dice to a certain minimum size), 
then only a subset of the disks will be used, wasting potiential I/O 
bandwidth.  Your stripes are too wide.
> (c) Would the use of several smaller vdev would help much? And which
> layout would be a good compromise for getting space as well as
> performance and reliability? 46 disks have so few prime factors
Yes, more vdevs should definitely help quite a lot for dealing with 
real-world muti-user loads.  One raidz/raidz2 vdev provides (at most) 
the IOPs of a single disk.

There is a point of diminishing returns and your layout has gone far 
beyond this limit.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Miles Nordin

2008-Dec-02 20:01 UTC

head link

[zfs-discuss] Asymmetric zpool load

>>>>> "ca" == Carsten Aulbert <carsten.aulbert at
aei.mpg.de> writes:
ca> (a) Why the first vdev does not get an equal share
ca> of the load

I don''t know. but, if you don''t add all the vdev''s
before writing
anything, there''s no magic to make them balance themselves out. Stuff
stays where it''s written. I''m guessing you did add them at
the same
time, and they still filled up unevenly?

''zpool iostat'' that you showed is the place I found to see how
data is
spread among vdev''s.

ca> (b) Why is a large raidz2 so bad? When I use a
ca> standard Linux box with hardware raid6 over 16 disks I usually
ca> get more bandwidth and at least about the same small file
ca> performance

obviously there are all kinds of things going on but...the standard
answer is, traditional RAID5/6 doesn''t have to do full stripe I/O.
ZFS is more like FreeBSD''s RAID3: it gets around the NVRAMless-RAID5
write hole by always writing a full stripe, which means all spindles
seek together and you get the seek performance of 1 drive (per vdev).
Linux RAID5/6 just gives up and accepts a write hole, AIUI, but
because the stripes are much fatter than a filesystem block, you''ll
sometimes get the record you need by seeking a subset of the drives
rather than all of them, which means the drives you didn''t seek have
the chance to fetch another record.

If you''re saying you get worse performance than a single spindle,
I''m
not sure why.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081202/5ace34de/attachment.bin>

Carsten Aulbert

2008-Dec-02 20:12 UTC

head link

[zfs-discuss] Asymmetric zpool load

Hi Miles,

Miles Nordin wrote:>>>>>> "ca" == Carsten Aulbert <carsten.aulbert
at aei.mpg.de> writes:
> 
>     ca> (a) Why the first vdev does not get an equal share
>     ca> of the load
> 
> I don''t know.  but, if you don''t add all the
vdev''s before writing
> anything, there''s no magic to make them balance themselves out. 
Stuff
> stays where it''s written.  I''m guessing you did add them
at the same
> time, and they still filled up unevenly?
> 
Yes, they are created all in one go (even on the same command line) and
only then are filled - either "naturally" over time or via zfs
send/receive (all on Sol10u5). So yes, it seems they fill up unevenly.
> ''zpool iostat'' that you showed is the place I found to
see how data is
> spread among vdev''s.
> 
>     ca>  (b) Why is a large raidz2 so bad? When I use a
>     ca> standard Linux box with hardware raid6 over 16 disks I usually
>     ca> get more bandwidth and at least about the same small file
>     ca> performance
> 
> obviously there are all kinds of things going on but...the standard
> answer is, traditional RAID5/6 doesn''t have to do full stripe I/O.
> ZFS is more like FreeBSD''s RAID3: it gets around the
NVRAMless-RAID5
> write hole by always writing a full stripe, which means all spindles
> seek together and you get the seek performance of 1 drive (per vdev).
> Linux RAID5/6 just gives up and accepts a write hole, AIUI, but
> because the stripes are much fatter than a filesystem block,
you''ll
> sometimes get the record you need by seeking a subset of the drives
> rather than all of them, which means the drives you didn''t seek
have
> the chance to fetch another record.
> 
> If you''re saying you get worse performance than a single spindle,
I''m
> not sure why.
>
No I think a single disk would be much less performant, however I''m a
bit disappointed by the overall performance of the boxes and just now we
have users where they experience extremely slow performance.

But already thanks for the inside

Cheers

Carsten

Carsten Aulbert

2008-Dec-02 20:20 UTC

head link

[zfs-discuss] Asymmetric zpool load

Bob Friesenhahn wrote:> You may have one or more "slow" disk drives which slow down the
whole
> vdev due to long wait times.  If you can identify those slow disk drives
> and replace them, then overall performance is likely to improve.
> 
> The problem is that under severe load, the vdev with the highest backlog
> will be used the least.  One or more slow disks in the vdev will slow
> down the whole vdev.  It takes only one slow disk to slow down the whole
> vdev.
Hmm, since I only started with Solaris this year, is there a way to
identify a "slow" disk? In principle these should all be identical
Hitachi Deathstar^WDeskstar drives and should only have the standard
deviation during production.> 
> ZFS commits the writes to all involved disks in a raidz2 before
> proceeding with the next write.  With so many disks, you are asking for
> quite a lot of fortuitous luck in that everything must be working
> optimally.  Compounding the problem is that I understand that when the
> stripe width exceeds the number of segmented blocks from the data to be
> written (ZFS is only willing to dice to a certain minimum size), then
> only a subset of the disks will be used, wasting potiential I/O
> bandwidth.  Your stripes are too wide.
> 
Ah, ok, that''s one of the first reasonable explanation (which I
understand) why large zpools might be bad. So far I was not able to
track that down and only found the standard "magic" rule not to exceed
10 drives - but our (synthetic) tests had not shown a significant
drawbacks. But I guess we might be bitten by it now.
>> (c) Would the use of several smaller vdev would help much? And which
>> layout would be a good compromise for getting space as well as
>> performance and reliability? 46 disks have so few prime factors
> 
> Yes, more vdevs should definitely help quite a lot for dealing with
> real-world muti-user loads.  One raidz/raidz2 vdev provides (at most)
> the IOPs of a single disk.
> 
> There is a point of diminishing returns and your layout has gone far
> beyond this limit.
Thanks for the insight, I guess I need to experiment with empty boxes to
get into a better state!

Cheers

Carsten

Bob Friesenhahn

2008-Dec-02 20:22 UTC

head link

[zfs-discuss] Asymmetric zpool load

On Tue, 2 Dec 2008, Carsten Aulbert wrote:>
> No I think a single disk would be much less performant, however
I''m a
> bit disappointed by the overall performance of the boxes and just now we
> have users where they experience extremely slow performance.
If all of the disks in the vdev need to be written at once prior to 
the next write, then the write latency will surely be more than just 
one disk.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2008-Dec-02 20:32 UTC

head link

[zfs-discuss] Asymmetric zpool load

On Tue, 2 Dec 2008, Carsten Aulbert wrote:>
> Hmm, since I only started with Solaris this year, is there a way to
> identify a "slow" disk? In principle these should all be
identical
> Hitachi Deathstar^WDeskstar drives and should only have the standard
> deviation during production.
Look at the output of ''iostat -xn 30'' when the system is under
load.
Possibly ignore the initial output entry since that is an aggregate 
since the dawn of time.

You will need to know which disks are in each vdev.  Check to see if 
the asvc_t value for one of the disks is much more than the others in 
the same vdev. If a disk is acting as the bottleneck then it is likely 
that its asvc_t value is far greater than the others.

In order to get zfs''s view of I/O at use

   zpool iostat -v poolname 30

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Ross

2008-Dec-03 06:48 UTC

head link

[zfs-discuss] Asymmetric zpool load

Guys, this looks to me like the second time we''ve had something like
this reported on the forums for an x4500, again with the first zvol having much
lower load than the other two, despite being created at the same time.

I can''t find the thread to check, can anybody else remember it?
-- 
This message posted from opensolaris.org

Ross

2008-Dec-03 06:52 UTC

head link

[zfs-discuss] Asymmetric zpool load

Aha, found it!  It was this thread, also started by Carsten :)
http://www.opensolaris.org/jive/thread.jspa?threadID=78921&tstart=45
-- 
This message posted from opensolaris.org

Carsten Aulbert

2008-Dec-03 08:40 UTC

head link

[zfs-discuss] Asymmetric zpool load

Ross wrote:> Aha, found it!  It was this thread, also started by Carsten :)
> http://www.opensolaris.org/jive/thread.jspa?threadID=78921&tstart=45
Did I? Darn, I need to get a brain upgrade.

But yes, there it was mainly focused on zfs send/receive being slow -
but maybe these are also linked.

What I will try today/this week:

Put some stress on the system with bonnie and other tools and try to
find slow disks and see if this could be the main problem but also look
into more vdevs and then possible move to raidz to somehow compensate
for lost disk space. Since we have 4 cold spares on the shelf plus a SMS
warnings on disk failures (that is if fma catches them) the risk
involved should be tolerable.

More later.

Carsten

Marc Bevand

2008-Dec-03 10:11 UTC

head link

[zfs-discuss] Asymmetric zpool load

Carsten Aulbert <carsten.aulbert <at> aei.mpg.de>
writes:> 
> Put some stress on the system with bonnie and other tools and try to
> find slow disks
Just run "iostat -Mnx 2" (not zpool iostat) while ls is slow to find
the slow
disks. Look at the %b (busy) values.

-marc

Carsten Aulbert

2008-Dec-03 12:30 UTC

head link

[zfs-discuss] Asymmetric zpool load

Carsten Aulbert wrote:
> Put some stress on the system with bonnie and other tools and try to
> find slow disks and see if this could be the main problem but also look
> into more vdevs and then possible move to raidz to somehow compensate
> for lost disk space. Since we have 4 cold spares on the shelf plus a SMS
> warnings on disk failures (that is if fma catches them) the risk
> involved should be tolerable.
First result with bonnie during the "writing intelligently..." phase I
see this in a 2 minute average:

zpool iostats:

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
atlashome   1.70T  19.2T    225  1.49K   342K   107M
  raidz2     550G  6.28T     74    409   114K  32.6M
    c0t0d0      -      -      0    314  32.3K  2.51M
    c1t0d0      -      -      0    315  31.8K  2.52M
    c4t0d0      -      -      0    313  31.3K  2.52M
    c6t0d0      -      -      0    315  32.3K  2.51M
    c7t0d0      -      -      0    326  32.8K  2.50M
    c0t1d0      -      -      0    309  33.9K  2.52M
    c1t1d0      -      -      0    313  33.4K  2.51M
    c4t1d0      -      -      0    314  33.4K  2.52M
    c5t1d0      -      -      0    308  32.8K  2.52M
    c6t1d0      -      -      0    314  31.3K  2.51M
    c7t1d0      -      -      0    311  31.8K  2.52M
    c0t2d0      -      -      0    309  31.8K  2.52M
    c1t2d0      -      -      0    313  31.8K  2.51M
    c4t2d0      -      -      0    315  31.8K  2.52M
    c5t2d0      -      -      0    307  32.8K  2.52M
  raidz2     567G  6.26T     64    529  96.5K  36.3M
    c6t2d0      -      -      1    368  74.2K  2.79M
    c7t2d0      -      -      1    366  74.2K  2.80M
    c0t3d0      -      -      1    364  75.8K  2.80M
    c1t3d0      -      -      1    365  75.2K  2.80M
    c4t3d0      -      -      1    368  76.8K  2.80M
    c5t3d0      -      -      1    362  76.3K  2.80M
    c6t3d0      -      -      1    366  77.9K  2.80M
    c7t3d0      -      -      1    365  76.8K  2.80M
    c0t4d0      -      -      1    361  76.8K  2.80M
    c1t4d0      -      -      1    363  75.8K  2.80M
    c4t4d0      -      -      1    366  76.3K  2.80M
    c6t4d0      -      -      1    364  78.4K  2.80M
    c7t4d0      -      -      1    370  78.9K  2.79M
    c0t5d0      -      -      1    365  77.3K  2.80M
    c1t5d0      -      -      1    364  74.7K  2.80M
  raidz2     620G  6.64T     86    582   131K  37.9M
    c4t5d0      -      -     18    382  1.16M  2.74M
    c5t5d0      -      -     10    380   674K  2.74M
    c6t5d0      -      -     18    378  1.15M  2.73M
    c7t5d0      -      -      9    384   628K  2.74M
    c0t6d0      -      -     18    377  1.16M  2.74M
    c1t6d0      -      -     10    383   680K  2.75M
    c4t6d0      -      -     19    379  1.21M  2.73M
    c5t6d0      -      -     10    383   691K  2.75M
    c6t6d0      -      -     19    379  1.21M  2.73M
    c7t6d0      -      -     10    383   676K  2.72M
    c0t7d0      -      -     18    374  1.19M  2.75M
    c1t7d0      -      -     10    381   676K  2.74M
    c4t7d0      -      -     19    380  1.22M  2.74M
    c5t7d0      -      -     10    382   696K  2.74M
    c6t7d0      -      -     18    381  1.17M  2.74M
    c7t7d0      -      -      9    386   631K  2.75M
----------  -----  -----  -----  -----  -----  -----

iostat -Mnx 120:
                    extended device statistics
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c2t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c3t0d0
    0.0    1.4    0.0    0.0  0.0  0.0    1.5    0.4   0   0 c5t0d0
    0.6  351.5    0.0    2.6  0.4  0.1    1.2    0.2   3   8 c7t0d0
    0.6  336.3    0.0    2.6  0.1  0.1    0.4    0.2   3   7 c0t0d0
    0.6  340.8    0.0    2.6  0.2  0.1    0.6    0.2   3   7 c1t0d0
    0.6  330.6    0.0    2.6  0.1  0.1    0.3    0.2   3   7 c5t1d0
    0.6  336.7    0.0    2.6  0.1  0.1    0.3    0.2   3   7 c4t0d0
    0.6  331.8    0.0    2.6  0.1  0.1    0.3    0.2   3   7 c0t1d0
    0.6  339.0    0.0    2.6  0.4  0.1    1.1    0.2   3   7 c7t1d0
    0.6  335.4    0.0    2.6  0.1  0.1    0.4    0.2   3   7 c1t1d0
    0.6  329.2    0.0    2.6  0.1  0.1    0.3    0.2   3   7 c5t2d0
    0.6  343.7    0.0    2.6  0.3  0.1    0.7    0.2   3   7 c4t1d0
    0.6  331.8    0.0    2.6  0.1  0.1    0.3    0.2   2   7 c0t2d0
    1.2  396.3    0.1    2.9  0.3  0.1    0.7    0.2   4   8 c7t2d0
    0.6  336.7    0.0    2.6  0.1  0.1    0.4    0.2   3   7 c1t2d0
    0.6  341.9    0.0    2.6  0.2  0.1    0.7    0.2   3   7 c4t2d0
    1.3  390.7    0.1    2.9  0.3  0.1    0.8    0.2   4   9 c5t3d0
    1.3  396.7    0.1    2.9  0.3  0.1    0.8    0.2   4   9 c7t3d0
    1.3  393.6    0.1    2.9  0.2  0.1    0.6    0.2   4   9 c0t3d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c5t4d0
    1.3  396.2    0.1    2.9  0.2  0.1    0.5    0.2   4   8 c1t3d0
    1.3  399.2    0.1    2.9  0.3  0.1    0.8    0.2   4   9 c4t3d0
    1.3  401.8    0.1    2.9  0.3  0.1    0.8    0.2   4   9 c7t4d0
    1.3  388.5    0.1    2.9  0.2  0.1    0.5    0.2   4   8 c0t4d0
    1.3  391.8    0.1    2.9  0.2  0.1    0.5    0.2   4   9 c1t4d0
    1.3  395.1    0.1    2.9  0.2  0.1    0.6    0.2   4   8 c4t4d0
    9.9  409.7    0.6    2.9  0.8  0.2    1.9    0.4  10  18 c7t5d0
    1.3  395.0    0.1    2.9  0.3  0.1    0.6    0.2   4   9 c0t5d0
   10.6  405.3    0.7    2.9  0.8  0.2    2.0    0.4  11  18 c5t5d0
    1.3  392.8    0.1    2.9  0.2  0.1    0.5    0.2   4   8 c1t5d0
   10.7  407.6    0.7    2.9  0.9  0.2    2.1    0.4  11  19 c7t6d0
   18.6  407.5    1.2    2.9  1.0  0.2    2.4    0.6  15  24 c4t5d0
   10.9  407.8    0.7    2.9  0.8  0.2    2.0    0.4  11  19 c5t6d0
    0.6  337.6    0.0    2.6  0.2  0.1    0.5    0.2   3   7 c6t0d0
   10.7  408.8    0.7    2.9  0.8  0.2    1.9    0.4  11  19 c1t6d0
   10.0  411.6    0.6    2.9  0.8  0.2    1.8    0.4  11  18 c7t7d0
   19.3  403.1    1.2    2.9  1.1  0.3    2.6    0.6  16  26 c4t6d0
    0.6  336.2    0.0    2.6  0.1  0.1    0.4    0.2   3   7 c6t1d0
   11.0  407.7    0.7    2.9  0.8  0.2    1.9    0.4  11  19 c5t7d0
   10.6  406.6    0.7    2.9  0.8  0.2    2.0    0.4  11  19 c1t7d0
   18.5  401.7    1.2    2.9  1.0  0.2    2.5    0.6  15  25 c0t6d0
   19.4  404.8    1.2    2.9  1.0  0.3    2.5    0.6  15  25 c4t7d0
    1.2  397.6    0.1    2.9  0.3  0.1    0.9    0.2   4   9 c6t2d0
   19.0  398.7    1.2    2.9  1.0  0.3    2.5    0.6  15  25 c0t7d0
    1.3  396.1    0.1    2.9  0.2  0.1    0.5    0.2   4   8 c6t3d0
    1.3  392.8    0.1    2.9  0.2  0.1    0.4    0.2   4   8 c6t4d0
   18.4  403.3    1.2    2.9  1.1  0.2    2.5    0.6  15  24 c6t5d0
   19.3  402.7    1.2    2.9  1.1  0.3    2.5    0.6  15  25 c6t6d0
   18.8  406.1    1.2    2.9  1.0  0.2    2.4    0.6  15  25 c6t7d0


Any experts here to say if that''s just because bonnie via NFSv3 is a
very special test - if it is I can start something else, suggestions? -
or if some disks are really too busy and slowing down the pool.

Thanks for more insight

Carsten

Roch

2009-Jan-05 11:16 UTC

head link

[zfs-discuss] Asymmetric zpool load

Carsten Aulbert writes:
 > Carsten Aulbert wrote:
 > 
 > > Put some stress on the system with bonnie and other tools and try to
 > > find slow disks and see if this could be the main problem but also
look
 > > into more vdevs and then possible move to raidz to somehow compensate
 > > for lost disk space. Since we have 4 cold spares on the shelf plus a
SMS
 > > warnings on disk failures (that is if fma catches them) the risk
 > > involved should be tolerable.
 > 
 > First result with bonnie during the "writing intelligently..."
phase I
 > see this in a 2 minute average:
 > 
 > zpool iostats:
 > 
 >                capacity     operations    bandwidth
 > pool         used  avail   read  write   read  write
 > ----------  -----  -----  -----  -----  -----  -----
 > atlashome   1.70T  19.2T    225  1.49K   342K   107M
 >   raidz2     550G  6.28T     74    409   114K  32.6M
 >     c0t0d0      -      -      0    314  32.3K  2.51M
 >     c1t0d0      -      -      0    315  31.8K  2.52M
 >     c4t0d0      -      -      0    313  31.3K  2.52M
 >     c6t0d0      -      -      0    315  32.3K  2.51M
 >     c7t0d0      -      -      0    326  32.8K  2.50M
 >     c0t1d0      -      -      0    309  33.9K  2.52M
 >     c1t1d0      -      -      0    313  33.4K  2.51M
 >     c4t1d0      -      -      0    314  33.4K  2.52M
 >     c5t1d0      -      -      0    308  32.8K  2.52M
 >     c6t1d0      -      -      0    314  31.3K  2.51M
 >     c7t1d0      -      -      0    311  31.8K  2.52M
 >     c0t2d0      -      -      0    309  31.8K  2.52M
 >     c1t2d0      -      -      0    313  31.8K  2.51M
 >     c4t2d0      -      -      0    315  31.8K  2.52M
 >     c5t2d0      -      -      0    307  32.8K  2.52M
 >   raidz2     567G  6.26T     64    529  96.5K  36.3M
 >     c6t2d0      -      -      1    368  74.2K  2.79M
 >     c7t2d0      -      -      1    366  74.2K  2.80M
 >     c0t3d0      -      -      1    364  75.8K  2.80M
 >     c1t3d0      -      -      1    365  75.2K  2.80M
 >     c4t3d0      -      -      1    368  76.8K  2.80M
 >     c5t3d0      -      -      1    362  76.3K  2.80M
 >     c6t3d0      -      -      1    366  77.9K  2.80M
 >     c7t3d0      -      -      1    365  76.8K  2.80M
 >     c0t4d0      -      -      1    361  76.8K  2.80M
 >     c1t4d0      -      -      1    363  75.8K  2.80M
 >     c4t4d0      -      -      1    366  76.3K  2.80M
 >     c6t4d0      -      -      1    364  78.4K  2.80M
 >     c7t4d0      -      -      1    370  78.9K  2.79M
 >     c0t5d0      -      -      1    365  77.3K  2.80M
 >     c1t5d0      -      -      1    364  74.7K  2.80M
 >   raidz2     620G  6.64T     86    582   131K  37.9M
 >     c4t5d0      -      -     18    382  1.16M  2.74M
 >     c5t5d0      -      -     10    380   674K  2.74M
 >     c6t5d0      -      -     18    378  1.15M  2.73M
 >     c7t5d0      -      -      9    384   628K  2.74M
 >     c0t6d0      -      -     18    377  1.16M  2.74M
 >     c1t6d0      -      -     10    383   680K  2.75M
 >     c4t6d0      -      -     19    379  1.21M  2.73M
 >     c5t6d0      -      -     10    383   691K  2.75M
 >     c6t6d0      -      -     19    379  1.21M  2.73M
 >     c7t6d0      -      -     10    383   676K  2.72M
 >     c0t7d0      -      -     18    374  1.19M  2.75M
 >     c1t7d0      -      -     10    381   676K  2.74M
 >     c4t7d0      -      -     19    380  1.22M  2.74M
 >     c5t7d0      -      -     10    382   696K  2.74M
 >     c6t7d0      -      -     18    381  1.17M  2.74M
 >     c7t7d0      -      -      9    386   631K  2.75M
 > ----------  -----  -----  -----  -----  -----  -----
 > 
 > iostat -Mnx 120:
 >                     extended device statistics
 >     r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 >     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c2t0d0
 >     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c3t0d0
 >     0.0    1.4    0.0    0.0  0.0  0.0    1.5    0.4   0   0 c5t0d0
 >     0.6  351.5    0.0    2.6  0.4  0.1    1.2    0.2   3   8 c7t0d0
 >     0.6  336.3    0.0    2.6  0.1  0.1    0.4    0.2   3   7 c0t0d0
 >     0.6  340.8    0.0    2.6  0.2  0.1    0.6    0.2   3   7 c1t0d0
 >     0.6  330.6    0.0    2.6  0.1  0.1    0.3    0.2   3   7 c5t1d0
 >     0.6  336.7    0.0    2.6  0.1  0.1    0.3    0.2   3   7 c4t0d0
 >     0.6  331.8    0.0    2.6  0.1  0.1    0.3    0.2   3   7 c0t1d0
 >     0.6  339.0    0.0    2.6  0.4  0.1    1.1    0.2   3   7 c7t1d0
 >     0.6  335.4    0.0    2.6  0.1  0.1    0.4    0.2   3   7 c1t1d0
 >     0.6  329.2    0.0    2.6  0.1  0.1    0.3    0.2   3   7 c5t2d0
 >     0.6  343.7    0.0    2.6  0.3  0.1    0.7    0.2   3   7 c4t1d0
 >     0.6  331.8    0.0    2.6  0.1  0.1    0.3    0.2   2   7 c0t2d0
 >     1.2  396.3    0.1    2.9  0.3  0.1    0.7    0.2   4   8 c7t2d0
 >     0.6  336.7    0.0    2.6  0.1  0.1    0.4    0.2   3   7 c1t2d0
 >     0.6  341.9    0.0    2.6  0.2  0.1    0.7    0.2   3   7 c4t2d0
 >     1.3  390.7    0.1    2.9  0.3  0.1    0.8    0.2   4   9 c5t3d0
 >     1.3  396.7    0.1    2.9  0.3  0.1    0.8    0.2   4   9 c7t3d0
 >     1.3  393.6    0.1    2.9  0.2  0.1    0.6    0.2   4   9 c0t3d0
 >     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c5t4d0
 >     1.3  396.2    0.1    2.9  0.2  0.1    0.5    0.2   4   8 c1t3d0
 >     1.3  399.2    0.1    2.9  0.3  0.1    0.8    0.2   4   9 c4t3d0
 >     1.3  401.8    0.1    2.9  0.3  0.1    0.8    0.2   4   9 c7t4d0
 >     1.3  388.5    0.1    2.9  0.2  0.1    0.5    0.2   4   8 c0t4d0
 >     1.3  391.8    0.1    2.9  0.2  0.1    0.5    0.2   4   9 c1t4d0
 >     1.3  395.1    0.1    2.9  0.2  0.1    0.6    0.2   4   8 c4t4d0
 >     9.9  409.7    0.6    2.9  0.8  0.2    1.9    0.4  10  18 c7t5d0
 >     1.3  395.0    0.1    2.9  0.3  0.1    0.6    0.2   4   9 c0t5d0
 >    10.6  405.3    0.7    2.9  0.8  0.2    2.0    0.4  11  18 c5t5d0
 >     1.3  392.8    0.1    2.9  0.2  0.1    0.5    0.2   4   8 c1t5d0
 >    10.7  407.6    0.7    2.9  0.9  0.2    2.1    0.4  11  19 c7t6d0
 >    18.6  407.5    1.2    2.9  1.0  0.2    2.4    0.6  15  24 c4t5d0
 >    10.9  407.8    0.7    2.9  0.8  0.2    2.0    0.4  11  19 c5t6d0
 >     0.6  337.6    0.0    2.6  0.2  0.1    0.5    0.2   3   7 c6t0d0
 >    10.7  408.8    0.7    2.9  0.8  0.2    1.9    0.4  11  19 c1t6d0
 >    10.0  411.6    0.6    2.9  0.8  0.2    1.8    0.4  11  18 c7t7d0
 >    19.3  403.1    1.2    2.9  1.1  0.3    2.6    0.6  16  26 c4t6d0
 >     0.6  336.2    0.0    2.6  0.1  0.1    0.4    0.2   3   7 c6t1d0
 >    11.0  407.7    0.7    2.9  0.8  0.2    1.9    0.4  11  19 c5t7d0
 >    10.6  406.6    0.7    2.9  0.8  0.2    2.0    0.4  11  19 c1t7d0
 >    18.5  401.7    1.2    2.9  1.0  0.2    2.5    0.6  15  25 c0t6d0
 >    19.4  404.8    1.2    2.9  1.0  0.3    2.5    0.6  15  25 c4t7d0
 >     1.2  397.6    0.1    2.9  0.3  0.1    0.9    0.2   4   9 c6t2d0
 >    19.0  398.7    1.2    2.9  1.0  0.3    2.5    0.6  15  25 c0t7d0
 >     1.3  396.1    0.1    2.9  0.2  0.1    0.5    0.2   4   8 c6t3d0
 >     1.3  392.8    0.1    2.9  0.2  0.1    0.4    0.2   4   8 c6t4d0
 >    18.4  403.3    1.2    2.9  1.1  0.2    2.5    0.6  15  24 c6t5d0
 >    19.3  402.7    1.2    2.9  1.1  0.3    2.5    0.6  15  25 c6t6d0
 >    18.8  406.1    1.2    2.9  1.0  0.2    2.4    0.6  15  25 c6t7d0
 > 
 > 
 > Any experts here to say if that''s just because bonnie via NFSv3
is a
 > very special test - if it is I can start something else, suggestions? -
 > or if some disks are really too busy and slowing down the pool.
 > 

Here is my attempt :	

	http://blogs.sun.com/roch/entry/decoding_bonnie

-r

 > Thanks for more insight
 > 
 > Carsten
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

zfs discuss - Dec 2008 - Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load

[zfs-discuss] Asymmetric zpool load