thr3ads.net - zfs discuss - [zfs-discuss] opensol-20060605 # zpool iostat -v 1 [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Rob Logan

2006-Jun-09 05:36 UTC

[zfs-discuss] opensol-20060605 # zpool iostat -v 1

why is sum of disks bandwidth from `zpool iostat -v 1`
less than the pool total while watching `du /zfs`
on opensol-20060605 bits?

                capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs         1.17T  1.16T    147      0   573K      0
   raidz1    1.17T  1.16T    147      0   573K      0
     c2d0p0      -      -     29      0  1.93M      0
     c4d0p0      -      -     30      0  2.07M      0
     c6d0p0      -      -     45      0  3.05M      0
     c8d0p0      -      -     24      0  1.50M      0
     c3d0p0      -      -     39      0  2.44M      0
     c5d0p0      -      -     35      0  2.30M      0
     c7d0p0      -      -     50      0  3.24M      0
     c9d0p0      -      -     22      0  1.36M      0
----------  -----  -----  -----  -----  -----  -----

                capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs         1.17T  1.16T    129      0   565K      0
   raidz1    1.17T  1.16T    129      0   565K      0
     c2d0p0      -      -     38      0  2.42M      0
     c4d0p0      -      -     34      0  2.18M      0
     c6d0p0      -      -     35      0  2.18M      0
     c8d0p0      -      -     33      0  2.10M      0
     c3d0p0      -      -     39      0  2.50M      0
     c5d0p0      -      -     38      0  2.38M      0
     c7d0p0      -      -     36      0  2.26M      0
     c9d0p0      -      -     32      0  2.06M      0
----------  -----  -----  -----  -----  -----  -----


if `zpool upgrade` shows all pools as "ZFS version 3"
and I never rewrite any of my root dirs, does the old
old "raidz1" pool ever make any ditto blocks?

Robert Milkowski

2006-Jun-09 11:40 UTC

head link

[zfs-discuss] opensol-20060605 # zpool iostat -v 1

Hello Rob,

Friday, June 9, 2006, 7:36:58 AM, you wrote:

RL> why is sum of disks bandwidth from `zpool iostat -v 1`
RL> less than the pool total while watching `du /zfs`
RL> on opensol-20060605 bits?

RL>                 capacity     operations    bandwidth
RL> pool         used  avail   read  write   read  write
RL> ----------  -----  -----  -----  -----  -----  -----
RL> zfs         1.17T  1.16T    147      0   573K      0
RL>    raidz1    1.17T  1.16T    147      0   573K      0
RL>      c2d0p0      -      -     29      0  1.93M      0
RL>      c4d0p0      -      -     30      0  2.07M      0
RL>      c6d0p0      -      -     45      0  3.05M      0
RL>      c8d0p0      -      -     24      0  1.50M      0
RL>      c3d0p0      -      -     39      0  2.44M      0
RL>      c5d0p0      -      -     35      0  2.30M      0
RL>      c7d0p0      -      -     50      0  3.24M      0
RL>      c9d0p0      -      -     22      0  1.36M      0
RL> ----------  -----  -----  -----  -----  -----  -----

Due to raid-z implementation. See last discussion on raid-z
performance, etc.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Jeff Bonwick

2006-Jun-10 00:54 UTC

head link

[zfs-discuss] opensol-20060605 # zpool iostat -v 1

> RL> why is sum of disks bandwidth from `zpool iostat -v 1`
> RL> less than the pool total while watching `du /zfs`
> RL> on opensol-20060605 bits?
> 
> Due to raid-z implementation. See last discussion on raid-z
> performance, etc.
It''s an artifact of the way raidz and the vdev read cache interact.

Currently, when you read a block from disk, we always read at least 64k.
We keep the result in a per-disk cache -- like a software track buffer.
The idea is that if you do several small reads in a row, only the first
one goes to disk.  For some workloads, this is a huge win.  For others,
it''s a net lose.  More tuning is needed, certainly.

Both the good and the bad aspects of vdev caching are amplified by
RAID-Z.  When you write a 2k block to a 5-disk raidz vdev, it will be
stored as a single 512-byte sector on each disk (4 data + 1 parity).
When you read it back, we''ll issue 4 reads (to the data disks);
each of those will become a 64k cache-fill read, so you''re reading
a total of 4*64k = 256k to fetch a 2k block.  If that block is the
first in a series, you''re golden: the next 127 reads will be free
(no disk I/O).  On the other hand, if it''s an isolated random read,
we just did 128 times more I/O than was actually useful.

This is a rather extreme case, but it''s real.  I''m hoping that
by
making the higher-level prefetch logic in ZFS a little smarter,
we can eliminate the need for vdev-level caching altogether.
If not, we''ll need to make the vdev cache policy smarter.

I''ve filed this bug to track the issue:

	6437054 vdev_cache: wise up or die

Jeff

Rob Logan

2006-Jun-11 12:16 UTC

head link

[zfs-discuss] Re: opensol-20060605 # zpool iostat -v 1

> a total of 4*64k = 256k to fetch a 2k block. > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6437054

perhaps a quick win would be to tell vdev_cache
about the DMU_OT_* type so it can read ahead appropriately.
it seems the largest losses are metadata. (du,find,scrub/resilver)

Ron Halstead

2007-Apr-23 17:10 UTC

head link

[zfs-discuss] Re: opensol-20060605 # zpool iostat -v 1

What is the status of bug 6437054? The bug tracker still shows it open.

Ron
 
 
This message posted from opensolaris.org

Eric Schrock

2007-Apr-23 17:13 UTC

head link

[zfs-discuss] Re: opensol-20060605 # zpool iostat -v 1

On Mon, Apr 23, 2007 at 10:10:23AM -0700, Ron Halstead
wrote:> What is the status of bug 6437054? The bug tracker still shows it open.
> 
> Ron
Do you mean:

6437054 vdev_cache: wise up or die

This bug is still under investigation.   A bunch of investigation has
been done, but no definitive action has been taken, yet.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Robert Milkowski

2007-Apr-23 21:06 UTC

head link

[zfs-discuss] Re: opensol-20060605 # zpool iostat -v 1

Hello Eric,

Monday, April 23, 2007, 7:13:26 PM, you wrote:

ES> On Mon, Apr 23, 2007 at 10:10:23AM -0700, Ron Halstead
wrote:>> What is the status of bug 6437054? The bug tracker still shows it open.
>> 
>> Ron
ES> Do you mean:

ES> 6437054 vdev_cache: wise up or die

ES> This bug is still under investigation.   A bunch of investigation has
ES> been done, but no definitive action has been taken, yet.

I set bshift to ^13 (8K) almost on every server with zfs.
I wish I could put it in /etc/system on S10U3...

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Ron Halstead

2007-Apr-24 14:20 UTC

head link

[zfs-discuss] Re: Re[2]: Re: opensol-20060605 # zpool iostat -v 1

Robert,

How do you set bshift to ^13 (8K)? Is there a document describing the procedure?

Ron
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-Apr-24 14:42 UTC

head link

[zfs-discuss] Re: Re[2]: Re: opensol-20060605 # zpool iostat -v 1

Hello Ron,

Tuesday, April 24, 2007, 4:20:41 PM, you wrote:

RH> Robert,

RH> How do you set bshift to ^13 (8K)? Is there a document describing the
procedure?

In latest nevada you can set it via /etc/system like:

  set zfs:zfs_vdev_cache_bshift=13

(2^13 is 8K).


In older releases or in S10U2/U3 you can set it via mdb or usr Roch''s
script - http://blogs.sun.com/roch/entry/tuning_the_knobs

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Ron Halstead

2007-Apr-24 14:54 UTC

head link

[zfs-discuss] Re: Re: Re[2]: Re: opensol-20060605 # zpool iostat -v 1

Thanks Robert. This will be put to use.

Ron
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-Apr-26 09:56 UTC

head link

[zfs-discuss] Re: Re: Re[2]: Re: opensol-20060605 # zpool iostat -v 1

Hello Ron,

Tuesday, April 24, 2007, 4:54:52 PM, you wrote:

RH> Thanks Robert. This will be put to use.

Please let us know about the results.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

zfs discuss - Jun 2006 - opensol-20060605 # zpool iostat -v 1

[zfs-discuss] opensol-20060605 # zpool iostat -v 1

[zfs-discuss] opensol-20060605 # zpool iostat -v 1

[zfs-discuss] opensol-20060605 # zpool iostat -v 1

[zfs-discuss] Re: opensol-20060605 # zpool iostat -v 1

[zfs-discuss] Re: opensol-20060605 # zpool iostat -v 1

[zfs-discuss] Re: opensol-20060605 # zpool iostat -v 1

[zfs-discuss] Re: opensol-20060605 # zpool iostat -v 1

[zfs-discuss] Re: Re[2]: Re: opensol-20060605 # zpool iostat -v 1

[zfs-discuss] Re: Re[2]: Re: opensol-20060605 # zpool iostat -v 1

[zfs-discuss] Re: Re: Re[2]: Re: opensol-20060605 # zpool iostat -v 1

[zfs-discuss] Re: Re: Re[2]: Re: opensol-20060605 # zpool iostat -v 1