thr3ads.net - zfs discuss - [zfs-discuss] Aggregate Pool I/O [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Brad

2009-Jan-17 09:07 UTC

[zfs-discuss] Aggregate Pool I/O

I''d like to track a server''s ZFS pool I/O throughput over
time. What''s a good data source to use for this? I like zpool iostat
for this, but if I poll at two points in time I would get a number since boot
(e.g. 1.2M) and a current number (e.g. 1.3K). If I use the current number then
I''ve lost data between polling intervals. But if I use the number since
boot it''s not precise enough to be useful.

Is there a kstat equivalent to the I/O since boot? Some other good data source?

And then is there a similar kstat equivalent to iostat? Would both data values
then allow me to trend file i/O versus physical disk I/O?

Thanks.
-- 
This message posted from opensolaris.org

Jim Dunham

2009-Jan-17 18:30 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Brad,
> I''d like to track a server''s ZFS pool I/O throughput over
time.
> What''s a good data source to use for this? I like zpool iostat for
> this, but if I poll at two points in time I would get a number since  
> boot (e.g. 1.2M) and a current number (e.g. 1.3K). If I use the  
> current number then I''ve lost data between polling intervals. But
if
> I use the number since boot it''s not precise enough to be useful.
>
> Is there a kstat equivalent to the I/O since boot? Some other good  
> data source?
>
> And then is there a similar kstat equivalent to iostat? Would both  
> data values then allow me to trend file i/O versus physical disk I/O?
I would enable SAR data collection at system boot time, then  
performing data mining activities on collected data relevant to your  
ZFS storage pool configurations.

	http://docs.sun.com/app/docs/doc/817-0403/spconcepts-60676?a=view

I would next look into one of the various SAR data graphing tools.

	http://sourceforge.net/projects/ksar/
	http://freshmeat.net/projects/ksar

Jim

>
>
> Thanks.
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Thomas Garner

2009-Jan-17 21:04 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Are you looking for something like:

kstat -c disk sd:::

Someone can correct me if I''m wrong, but I think the documentation for
the above should be at:

http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt

I''m not sure about the file i/o vs disk i/o, but would love to hear
how to measure it.

Thomas

On Sat, Jan 17, 2009 at 4:07 AM, Brad <bstone at aspirinsoftware.com>
wrote:> I''d like to track a server''s ZFS pool I/O throughput over
time. What''s a good data source to use for this? I like zpool iostat
for this, but if I poll at two points in time I would get a number since boot
(e.g. 1.2M) and a current number (e.g. 1.3K). If I use the current number then
I''ve lost data between polling intervals. But if I use the number since
boot it''s not precise enough to be useful.
>
> Is there a kstat equivalent to the I/O since boot? Some other good data
source?
>
> And then is there a similar kstat equivalent to iostat? Would both data
values then allow me to trend file i/O versus physical disk I/O?
>
> Thanks.
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

2009-Jan-17 21:21 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Please allow me to contribute some to this question, since this relates to 
the protocol question years ago that I could not give an answer in a few 
seconds.
[and I still cannot do that today -- no need, I insist.]

File I/O and block I/O is very different.

While block I/O measurements could be used for accross-platform comparisons 
(with various block protocols, since they all should carry very little 
overhead), file I/O heavily depends on the file protocols. Some file 
protocols are built with other preserved usage such as security and 
directory services, but carrying extra overhead for those.

Hence, when doing file I/O comparisons, you will have to keep in mind that 
file I/O technologies were not invented for ultimate performance.  Block I/O 
is.

Numbers are numbers, but how do you use them makes a huge impact on your 
conclusions.
:-)

Best,
z

----- Original Message ----- 
From: "Thomas Garner" <thomas536 at gmail.com>
To: "Brad" <bstone at aspirinsoftware.com>
Cc: <zfs-discuss at opensolaris.org>
Sent: Saturday, January 17, 2009 4:04 PM
Subject: Re: [zfs-discuss] Aggregate Pool I/O

> Are you looking for something like:
>
> kstat -c disk sd:::
>
> Someone can correct me if I''m wrong, but I think the documentation
for
> the above should be at:
>
>
http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt
>
> I''m not sure about the file i/o vs disk i/o, but would love to
hear
> how to measure it.
>
> Thomas
>
> On Sat, Jan 17, 2009 at 4:07 AM, Brad <bstone at aspirinsoftware.com>
wrote:
>> I''d like to track a server''s ZFS pool I/O throughput
over time. What''s a
>> good data source to use for this? I like zpool iostat for this, but if
I
>> poll at two points in time I would get a number since boot (e.g. 1.2M) 
>> and a current number (e.g. 1.3K). If I use the current number then
I''ve
>> lost data between polling intervals. But if I use the number since boot
>> it''s not precise enough to be useful.
>>
>> Is there a kstat equivalent to the I/O since boot? Some other good data
>> source?
>>
>> And then is there a similar kstat equivalent to iostat? Would both data
>> values then allow me to trend file i/O versus physical disk I/O?
>>
>> Thanks.
>> --
>> This message posted from opensolaris.org
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Richard Elling

2009-Jan-18 03:38 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Brad wrote:> I''d like to track a server''s ZFS pool I/O throughput over
time. What''s a good data source to use for this? I like zpool iostat
for this, but if I poll at two points in time I would get a number since boot
(e.g. 1.2M) and a current number (e.g. 1.3K). If I use the current number then
I''ve lost data between polling intervals. But if I use the number since
boot it''s not precise enough to be useful.
>
> Is there a kstat equivalent to the I/O since boot? Some other good data
source?
>
> And then is there a similar kstat equivalent to iostat? Would both data
values then allow me to trend file i/O versus physical disk I/O?
>   
Well, iostat gets its data from kstat, so it is really impossible to
separate the two.

Most folks who want performance data collection all day long will
enable accounting and use sar.  sar also uses kstats.  Or you can
write your own scripts.  Or there are a number of third party tools
which will collect long-term stats and provide nice reports or
capacity planning information.  Actually, there is a whole book
written on Solaris performance tools:
http://www.amazon.com/dp/0131568191?tag=solarisintern-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0131568191&adid=049SKSTPKNAJ9EZ23JW1&

 -- richard

Carson Gaspar

2009-Jan-18 09:21 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Richard Elling wrote:
...> Most folks who want performance data collection all day long will
> enable accounting and use sar.  sar also uses kstats.  Or you can
> write your own scripts.  Or there are a number of third party tools
> which will collect long-term stats and provide nice reports or
> capacity planning information.  Actually, there is a whole book
> written on Solaris performance tools:
>
http://www.amazon.com/dp/0131568191?tag=solarisintern-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0131568191&adid=049SKSTPKNAJ9EZ23JW1&
Except sar sucks. It''s scheduled via cron, and is too coarse grained
for
many purposes (10 minute long samples average out almost everything 
interesting).

If you write your own using kstat, you can get accurate sub-second 
samples. Sadly you''ll either have to use the amazingly crappy Sun perl 
or write it in C, as Sun hasn''t yet managed to release source for the 
kstat perl module (unless it happened while I wasn''t looking...)

-- 
Carson

Peter Tribble

2009-Jan-18 11:57 UTC

head link

[zfs-discuss] Aggregate Pool I/O

On Sun, Jan 18, 2009 at 9:21 AM, Carson Gaspar <carson at taltos.org>
wrote:>
> If you write your own using kstat, you can get accurate sub-second
> samples. Sadly you''ll either have to use the amazingly crappy Sun
perl
> or write it in C, as Sun hasn''t yet managed to release source for
the
> kstat perl module (unless it happened while I wasn''t looking...)
That''s been out forever. See:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/perl/contrib/Sun/Solaris/Kstat/

Or, if you''re interested in java I can plug the OpenSolaris project
JKstat,
kept up to date here:

http://www.petertribble.co.uk/Solaris/jkstat.html

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Peter Tribble

2009-Jan-18 12:06 UTC

head link

[zfs-discuss] Aggregate Pool I/O

On Sat, Jan 17, 2009 at 9:04 PM, Thomas Garner <thomas536 at gmail.com>
wrote:> Are you looking for something like:
>
> kstat -c disk sd:::
>
> Someone can correct me if I''m wrong, but I think the documentation
for
> the above should be at:
>
>
http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt
>
> I''m not sure about the file i/o vs disk i/o, but would love to
hear
> how to measure it.
See fsstat, which is based upon kstats. One of the thing I want to do with
JKstat is correlate filesystem operations with underlying disk operations. The
hard part is actually connecting a filesystem to the underlying drives.
That''s
harder with zfs as the disk I/O is mapped to a pool which has multiple
filesystems.

(The same is true with soft partitions under SVM, but with zfs sharing is the
rule rather than the exception.)

I would like to see the pool statistics exposed as kstats, though, which would
make it easier to analyse them with existing tools.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Brad

2009-Jan-18 17:39 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Well if I do fsstat mountpoint on all the filesystems in the ZFS pool, then I
guess my aggregate number for read and write bandwidth should equal the
aggregate numbers for the pool? Yes?

The downside is that fsstat has the same granularity issue as zpool iostat. What
I''d really like is nread and nwrite numbers instead of r/s w/s. That
way, if I miss some polls I can smooth out the results.

kstat -c disk sd::: is interesting, but seems to be only for locally-attached
disks, right? I am using iSCSI although soon will also have pools with local
disks.

For device data, I''d really like the per-pool and per-pool per device
breakdowns provided by zpool iostat, if only it weren''t summarized in a
5-character field. Perhaps I should simply be asking for sample code that
accesses libzfs....

I have rolled my own cron scheduler so I can have the sub-second queries.

Thanks for the info!
-- 
This message posted from opensolaris.org

Peter Tribble

2009-Jan-18 17:48 UTC

head link

[zfs-discuss] Aggregate Pool I/O

On Sun, Jan 18, 2009 at 5:39 PM, Brad <bstone at aspirinsoftware.com>
wrote:> Well if I do fsstat mountpoint on all the filesystems in the ZFS pool, then
I guess my aggregate number for read and write bandwidth should equal the
aggregate numbers for the pool? Yes?
>
> The downside is that fsstat has the same granularity issue as zpool iostat.
What I''d really like is nread and nwrite numbers instead of r/s w/s.
That way, if I miss some polls I can smooth out the results.
Just yank the raw kstats. This is a little harder than it seems.
Unless you''re in the
case where you only have one pool, in which case:

kstat unix:0:vopstats_zfs

will give you the aggregate of all zfs filesystems straight off.

The individual filesystem numbers come from kstats named like so:

kstat unix:0:vopstats_4480002

and you have to match up the device id with the filesystem name from
/etc/mnttab. In the case above, you need to match 4480002, which on
my machine is the following line in /etc/mnttab:

swap    /tmp    tmpfs   xattr,dev=4480002       1232289278

so that''s /tmp (not a zfs filesystem, but you should get the idea).

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Richard Elling

2009-Jan-18 20:25 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Peter Tribble wrote:> On Sat, Jan 17, 2009 at 9:04 PM, Thomas Garner <thomas536 at
gmail.com> wrote:
>> Are you looking for something like:
>>
>> kstat -c disk sd:::
>>
>> Someone can correct me if I''m wrong, but I think the
documentation for
>> the above should be at:
>>
>>
http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt
>>
>> I''m not sure about the file i/o vs disk i/o, but would love to
hear
>> how to measure it.
> 
> See fsstat, which is based upon kstats. One of the thing I want to do with
> JKstat is correlate filesystem operations with underlying disk operations.
The
> hard part is actually connecting a filesystem to the underlying drives.
That''s
> harder with zfs as the disk I/O is mapped to a pool which has multiple
> filesystems.
If we draw a stack with application on top, devices on bottom
then fsstat shows the load into the file system from above.
iostat shows load into the devices from above.  But because file
systems like to do things like caching, wrangling metadata,
prefetching, coalescing, deferring writes, and prewriting data (eg ZIL).
it is really hard to make a 1:1 correlation between an application''s
I/O activity and disk I/O -- except for a rather small subset of
overall activity.  This is one reason some databases like to deal
with raw devices. It is also why performance work for databases is
often done with raw devices -- fewer moving parts. The upshot is that
if you are looking for a 1:1 relationship, you will be sad.  Rather,
it is better to look at overall efficiencies, which are fairly well
presented.
> (The same is true with soft partitions under SVM, but with zfs sharing is
the
> rule rather than the exception.)
> 
> I would like to see the pool statistics exposed as kstats, though, which
would
> make it easier to analyse them with existing tools.
I recall some discussion about kstats in ZFS for performance a few years
ago, but IIRC the concensus seemed to be that ZFS was not going to get
overloaded with zillions of kstats.

BTW, I really like Peter''s work on JKstat -- well done!

Carson Gaspar wrote:
 > Except sar sucks. It''s scheduled via cron, and is too coarse
grained for
 > many purposes (10 minute long samples average out almost everything
 > interesting).

There is a world of difference between the tools needed to perform
debugging and performance improvements vs long-term trending.  sar
is a big, warty beast, but it works reasonably well for long-term
trending.  The 3rd party tools like TeamQuest are more modern and
do a better job -- you get what you pay for.
  -- richard

Peter Tribble

2009-Jan-18 20:40 UTC

head link

[zfs-discuss] Aggregate Pool I/O

On Sun, Jan 18, 2009 at 8:25 PM, Richard Elling <Richard.Elling at
sun.com> wrote:> Peter Tribble wrote:
>> See fsstat, which is based upon kstats. One of the thing I want to do
with
>> JKstat is correlate filesystem operations with underlying disk
operations.
>> The
>> hard part is actually connecting a filesystem to the underlying drives.
>> That''s
>> harder with zfs as the disk I/O is mapped to a pool which has multiple
>> filesystems.
>
> If we draw a stack with application on top, devices on bottom
> then fsstat shows the load into the file system from above.
> iostat shows load into the devices from above.  But because file
> systems like to do things like caching, wrangling metadata,
> prefetching, coalescing, deferring writes, and prewriting data (eg ZIL).
> it is really hard to make a 1:1 correlation between an
application''s
> I/O activity and disk I/O -- except for a rather small subset of
> overall activity.  This is one reason some databases like to deal
> with raw devices. It is also why performance work for databases is
> often done with raw devices -- fewer moving parts. The upshot is that
> if you are looking for a 1:1 relationship, you will be sad.  Rather,
> it is better to look at overall efficiencies, which are fairly well
> presented.
Indeed. Perhaps I didn''t put that clearly enough. What I''m
interested in is
comparing what goes in at the top (fsstat) with what you see at the bottom
(iostat) - it''s the differences that are interesting.

The hard part of the mapping is working out where a given file operation ought
to be associated with a given set of devices. Simple pooled storage is
one thing -
multiple devices can be associated with a given filesystem. Then put multiple
filesystems into a pool and you can N filesystems atop M devices and my head
explodes.
>> I would like to see the pool statistics exposed as kstats, though,
which
>> would
>> make it easier to analyse them with existing tools.
>
> I recall some discussion about kstats in ZFS for performance a few years
> ago, but IIRC the concensus seemed to be that ZFS was not going to get
> overloaded with zillions of kstats.
Yes, but zero is a bit, well, small. Just being able to do zpool iostat with
kstats would be a big win. (And zfs does have a reasonable number of kstats,
although I''m not absolutely sure what all of them mean and what their
stability
levels are.)
> BTW, I really like Peter''s work on JKstat -- well done!
Thanks!

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Carson Gaspar

2009-Jan-18 23:40 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Richard Elling wrote:
...> Carson Gaspar wrote:
>  > Except sar sucks. It''s scheduled via cron, and is too coarse
grained for
>  > many purposes (10 minute long samples average out almost everything
>  > interesting).
> 
> There is a world of difference between the tools needed to perform
> debugging and performance improvements vs long-term trending.  sar
> is a big, warty beast, but it works reasonably well for long-term
> trending.  The 3rd party tools like TeamQuest are more modern and
> do a better job -- you get what you pay for.
The problem is that even for long term trending you need better than 10 
minute resolution, unless your app isn''t bursty at all, or you leave a 
_lot_ of headroom (or you only care about throughput and not latency). 
Sadly, most (but by no means all) 3rd party tools are resource hogs 
themselves, so aren''t very good for permanent resource utilization 
tracking (although they can be amazing at application performance 
debugging). One of the really cool things about dtrace is its extremely 
low perormance impact.

Thus, my (trimmed in the quote) recommendation to write your own using 
kstat, as opposed to relying on sar. Or go buy something, but in my 
experience sar is unlikely to make you happy.

-- 
Carson

Richard Elling

2009-Jan-19 03:44 UTC

head link

[zfs-discuss] Aggregate Pool I/O

Carson Gaspar wrote:> Richard Elling wrote:
> ...
>   
>> Carson Gaspar wrote:
>>  > Except sar sucks. It''s scheduled via cron, and is too
coarse grained for
>>  > many purposes (10 minute long samples average out almost
everything
>>  > interesting).
>>
>> There is a world of difference between the tools needed to perform
>> debugging and performance improvements vs long-term trending.  sar
>> is a big, warty beast, but it works reasonably well for long-term
>> trending.  The 3rd party tools like TeamQuest are more modern and
>> do a better job -- you get what you pay for.
>>     
>
> The problem is that even for long term trending you need better than 10 
> minute resolution, unless your app isn''t bursty at all, or you
leave a
> _lot_ of headroom (or you only care about throughput and not latency). 
>   
By default, the crontab for sa1 (/var/spool/cron/crontabs/sys) sets
20-minute intervals.  This can easily be changed to suit your needs.
> Sadly, most (but by no means all) 3rd party tools are resource hogs 
> themselves, so aren''t very good for permanent resource utilization
> tracking (although they can be amazing at application performance 
> debugging). One of the really cool things about dtrace is its extremely 
> low perormance impact.
>   
Some dtrace scripts have very large, negative impacts on performance.
However, I think for most modern systems, sar won''t have much impact.
I dunno how all of the tools affect performance, however.  I suspect it
varies widely.
> Thus, my (trimmed in the quote) recommendation to write your own using 
> kstat, as opposed to relying on sar. Or go buy something, but in my 
> experience sar is unlikely to make you happy.
>   sar has lots of problems, which is why there is a market for 3rd party
capacity planning tools.  OTOH, most of the others aren''t open source.
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/sa/sar.c


I should also mention Fenxi, an open source performance analysis engine
we developed for analysis of performance experiments.  Again, it
doesn''t
replace capacity planning tools, but it certainly makes experiments easier
to manage.
https://fenxi.dev.java.net/

I reserve my comments on SunMC for dimly lit bars...
 -- richard

zfs discuss - Jan 2009 - Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O

[zfs-discuss] Aggregate Pool I/O