I''d like to track a server''s ZFS pool I/O throughput over time. What''s a good data source to use for this? I like zpool iostat for this, but if I poll at two points in time I would get a number since boot (e.g. 1.2M) and a current number (e.g. 1.3K). If I use the current number then I''ve lost data between polling intervals. But if I use the number since boot it''s not precise enough to be useful. Is there a kstat equivalent to the I/O since boot? Some other good data source? And then is there a similar kstat equivalent to iostat? Would both data values then allow me to trend file i/O versus physical disk I/O? Thanks. -- This message posted from opensolaris.org
Brad,> I''d like to track a server''s ZFS pool I/O throughput over time. > What''s a good data source to use for this? I like zpool iostat for > this, but if I poll at two points in time I would get a number since > boot (e.g. 1.2M) and a current number (e.g. 1.3K). If I use the > current number then I''ve lost data between polling intervals. But if > I use the number since boot it''s not precise enough to be useful. > > Is there a kstat equivalent to the I/O since boot? Some other good > data source? > > And then is there a similar kstat equivalent to iostat? Would both > data values then allow me to trend file i/O versus physical disk I/O?I would enable SAR data collection at system boot time, then performing data mining activities on collected data relevant to your ZFS storage pool configurations. http://docs.sun.com/app/docs/doc/817-0403/spconcepts-60676?a=view I would next look into one of the various SAR data graphing tools. http://sourceforge.net/projects/ksar/ http://freshmeat.net/projects/ksar Jim> > > Thanks. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Are you looking for something like: kstat -c disk sd::: Someone can correct me if I''m wrong, but I think the documentation for the above should be at: http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt I''m not sure about the file i/o vs disk i/o, but would love to hear how to measure it. Thomas On Sat, Jan 17, 2009 at 4:07 AM, Brad <bstone at aspirinsoftware.com> wrote:> I''d like to track a server''s ZFS pool I/O throughput over time. What''s a good data source to use for this? I like zpool iostat for this, but if I poll at two points in time I would get a number since boot (e.g. 1.2M) and a current number (e.g. 1.3K). If I use the current number then I''ve lost data between polling intervals. But if I use the number since boot it''s not precise enough to be useful. > > Is there a kstat equivalent to the I/O since boot? Some other good data source? > > And then is there a similar kstat equivalent to iostat? Would both data values then allow me to trend file i/O versus physical disk I/O? > > Thanks. > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Please allow me to contribute some to this question, since this relates to the protocol question years ago that I could not give an answer in a few seconds. [and I still cannot do that today -- no need, I insist.] File I/O and block I/O is very different. While block I/O measurements could be used for accross-platform comparisons (with various block protocols, since they all should carry very little overhead), file I/O heavily depends on the file protocols. Some file protocols are built with other preserved usage such as security and directory services, but carrying extra overhead for those. Hence, when doing file I/O comparisons, you will have to keep in mind that file I/O technologies were not invented for ultimate performance. Block I/O is. Numbers are numbers, but how do you use them makes a huge impact on your conclusions. :-) Best, z ----- Original Message ----- From: "Thomas Garner" <thomas536 at gmail.com> To: "Brad" <bstone at aspirinsoftware.com> Cc: <zfs-discuss at opensolaris.org> Sent: Saturday, January 17, 2009 4:04 PM Subject: Re: [zfs-discuss] Aggregate Pool I/O> Are you looking for something like: > > kstat -c disk sd::: > > Someone can correct me if I''m wrong, but I think the documentation for > the above should be at: > > http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt > > I''m not sure about the file i/o vs disk i/o, but would love to hear > how to measure it. > > Thomas > > On Sat, Jan 17, 2009 at 4:07 AM, Brad <bstone at aspirinsoftware.com> wrote: >> I''d like to track a server''s ZFS pool I/O throughput over time. What''s a >> good data source to use for this? I like zpool iostat for this, but if I >> poll at two points in time I would get a number since boot (e.g. 1.2M) >> and a current number (e.g. 1.3K). If I use the current number then I''ve >> lost data between polling intervals. But if I use the number since boot >> it''s not precise enough to be useful. >> >> Is there a kstat equivalent to the I/O since boot? Some other good data >> source? >> >> And then is there a similar kstat equivalent to iostat? Would both data >> values then allow me to trend file i/O versus physical disk I/O? >> >> Thanks. >> -- >> This message posted from opensolaris.org >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Brad wrote:> I''d like to track a server''s ZFS pool I/O throughput over time. What''s a good data source to use for this? I like zpool iostat for this, but if I poll at two points in time I would get a number since boot (e.g. 1.2M) and a current number (e.g. 1.3K). If I use the current number then I''ve lost data between polling intervals. But if I use the number since boot it''s not precise enough to be useful. > > Is there a kstat equivalent to the I/O since boot? Some other good data source? > > And then is there a similar kstat equivalent to iostat? Would both data values then allow me to trend file i/O versus physical disk I/O? >Well, iostat gets its data from kstat, so it is really impossible to separate the two. Most folks who want performance data collection all day long will enable accounting and use sar. sar also uses kstats. Or you can write your own scripts. Or there are a number of third party tools which will collect long-term stats and provide nice reports or capacity planning information. Actually, there is a whole book written on Solaris performance tools: http://www.amazon.com/dp/0131568191?tag=solarisintern-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0131568191&adid=049SKSTPKNAJ9EZ23JW1& -- richard
Richard Elling wrote: ...> Most folks who want performance data collection all day long will > enable accounting and use sar. sar also uses kstats. Or you can > write your own scripts. Or there are a number of third party tools > which will collect long-term stats and provide nice reports or > capacity planning information. Actually, there is a whole book > written on Solaris performance tools: > http://www.amazon.com/dp/0131568191?tag=solarisintern-20&camp=14573&creative=327641&linkCode=as1&creativeASIN=0131568191&adid=049SKSTPKNAJ9EZ23JW1&Except sar sucks. It''s scheduled via cron, and is too coarse grained for many purposes (10 minute long samples average out almost everything interesting). If you write your own using kstat, you can get accurate sub-second samples. Sadly you''ll either have to use the amazingly crappy Sun perl or write it in C, as Sun hasn''t yet managed to release source for the kstat perl module (unless it happened while I wasn''t looking...) -- Carson
On Sun, Jan 18, 2009 at 9:21 AM, Carson Gaspar <carson at taltos.org> wrote:> > If you write your own using kstat, you can get accurate sub-second > samples. Sadly you''ll either have to use the amazingly crappy Sun perl > or write it in C, as Sun hasn''t yet managed to release source for the > kstat perl module (unless it happened while I wasn''t looking...)That''s been out forever. See: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/perl/contrib/Sun/Solaris/Kstat/ Or, if you''re interested in java I can plug the OpenSolaris project JKstat, kept up to date here: http://www.petertribble.co.uk/Solaris/jkstat.html -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Sat, Jan 17, 2009 at 9:04 PM, Thomas Garner <thomas536 at gmail.com> wrote:> Are you looking for something like: > > kstat -c disk sd::: > > Someone can correct me if I''m wrong, but I think the documentation for > the above should be at: > > http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt > > I''m not sure about the file i/o vs disk i/o, but would love to hear > how to measure it.See fsstat, which is based upon kstats. One of the thing I want to do with JKstat is correlate filesystem operations with underlying disk operations. The hard part is actually connecting a filesystem to the underlying drives. That''s harder with zfs as the disk I/O is mapped to a pool which has multiple filesystems. (The same is true with soft partitions under SVM, but with zfs sharing is the rule rather than the exception.) I would like to see the pool statistics exposed as kstats, though, which would make it easier to analyse them with existing tools. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Well if I do fsstat mountpoint on all the filesystems in the ZFS pool, then I guess my aggregate number for read and write bandwidth should equal the aggregate numbers for the pool? Yes? The downside is that fsstat has the same granularity issue as zpool iostat. What I''d really like is nread and nwrite numbers instead of r/s w/s. That way, if I miss some polls I can smooth out the results. kstat -c disk sd::: is interesting, but seems to be only for locally-attached disks, right? I am using iSCSI although soon will also have pools with local disks. For device data, I''d really like the per-pool and per-pool per device breakdowns provided by zpool iostat, if only it weren''t summarized in a 5-character field. Perhaps I should simply be asking for sample code that accesses libzfs.... I have rolled my own cron scheduler so I can have the sub-second queries. Thanks for the info! -- This message posted from opensolaris.org
On Sun, Jan 18, 2009 at 5:39 PM, Brad <bstone at aspirinsoftware.com> wrote:> Well if I do fsstat mountpoint on all the filesystems in the ZFS pool, then I guess my aggregate number for read and write bandwidth should equal the aggregate numbers for the pool? Yes? > > The downside is that fsstat has the same granularity issue as zpool iostat. What I''d really like is nread and nwrite numbers instead of r/s w/s. That way, if I miss some polls I can smooth out the results.Just yank the raw kstats. This is a little harder than it seems. Unless you''re in the case where you only have one pool, in which case: kstat unix:0:vopstats_zfs will give you the aggregate of all zfs filesystems straight off. The individual filesystem numbers come from kstats named like so: kstat unix:0:vopstats_4480002 and you have to match up the device id with the filesystem name from /etc/mnttab. In the case above, you need to match 4480002, which on my machine is the following line in /etc/mnttab: swap /tmp tmpfs xattr,dev=4480002 1232289278 so that''s /tmp (not a zfs filesystem, but you should get the idea). -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Peter Tribble wrote:> On Sat, Jan 17, 2009 at 9:04 PM, Thomas Garner <thomas536 at gmail.com> wrote: >> Are you looking for something like: >> >> kstat -c disk sd::: >> >> Someone can correct me if I''m wrong, but I think the documentation for >> the above should be at: >> >> http://src.opensolaris.org/source/xref/zfs-crypto/gate/usr/src/uts/common/avs/ns/sdbc/cache_kstats_readme.txt >> >> I''m not sure about the file i/o vs disk i/o, but would love to hear >> how to measure it. > > See fsstat, which is based upon kstats. One of the thing I want to do with > JKstat is correlate filesystem operations with underlying disk operations. The > hard part is actually connecting a filesystem to the underlying drives. That''s > harder with zfs as the disk I/O is mapped to a pool which has multiple > filesystems.If we draw a stack with application on top, devices on bottom then fsstat shows the load into the file system from above. iostat shows load into the devices from above. But because file systems like to do things like caching, wrangling metadata, prefetching, coalescing, deferring writes, and prewriting data (eg ZIL). it is really hard to make a 1:1 correlation between an application''s I/O activity and disk I/O -- except for a rather small subset of overall activity. This is one reason some databases like to deal with raw devices. It is also why performance work for databases is often done with raw devices -- fewer moving parts. The upshot is that if you are looking for a 1:1 relationship, you will be sad. Rather, it is better to look at overall efficiencies, which are fairly well presented.> (The same is true with soft partitions under SVM, but with zfs sharing is the > rule rather than the exception.) > > I would like to see the pool statistics exposed as kstats, though, which would > make it easier to analyse them with existing tools.I recall some discussion about kstats in ZFS for performance a few years ago, but IIRC the concensus seemed to be that ZFS was not going to get overloaded with zillions of kstats. BTW, I really like Peter''s work on JKstat -- well done! Carson Gaspar wrote: > Except sar sucks. It''s scheduled via cron, and is too coarse grained for > many purposes (10 minute long samples average out almost everything > interesting). There is a world of difference between the tools needed to perform debugging and performance improvements vs long-term trending. sar is a big, warty beast, but it works reasonably well for long-term trending. The 3rd party tools like TeamQuest are more modern and do a better job -- you get what you pay for. -- richard
On Sun, Jan 18, 2009 at 8:25 PM, Richard Elling <Richard.Elling at sun.com> wrote:> Peter Tribble wrote: >> See fsstat, which is based upon kstats. One of the thing I want to do with >> JKstat is correlate filesystem operations with underlying disk operations. >> The >> hard part is actually connecting a filesystem to the underlying drives. >> That''s >> harder with zfs as the disk I/O is mapped to a pool which has multiple >> filesystems. > > If we draw a stack with application on top, devices on bottom > then fsstat shows the load into the file system from above. > iostat shows load into the devices from above. But because file > systems like to do things like caching, wrangling metadata, > prefetching, coalescing, deferring writes, and prewriting data (eg ZIL). > it is really hard to make a 1:1 correlation between an application''s > I/O activity and disk I/O -- except for a rather small subset of > overall activity. This is one reason some databases like to deal > with raw devices. It is also why performance work for databases is > often done with raw devices -- fewer moving parts. The upshot is that > if you are looking for a 1:1 relationship, you will be sad. Rather, > it is better to look at overall efficiencies, which are fairly well > presented.Indeed. Perhaps I didn''t put that clearly enough. What I''m interested in is comparing what goes in at the top (fsstat) with what you see at the bottom (iostat) - it''s the differences that are interesting. The hard part of the mapping is working out where a given file operation ought to be associated with a given set of devices. Simple pooled storage is one thing - multiple devices can be associated with a given filesystem. Then put multiple filesystems into a pool and you can N filesystems atop M devices and my head explodes.>> I would like to see the pool statistics exposed as kstats, though, which >> would >> make it easier to analyse them with existing tools. > > I recall some discussion about kstats in ZFS for performance a few years > ago, but IIRC the concensus seemed to be that ZFS was not going to get > overloaded with zillions of kstats.Yes, but zero is a bit, well, small. Just being able to do zpool iostat with kstats would be a big win. (And zfs does have a reasonable number of kstats, although I''m not absolutely sure what all of them mean and what their stability levels are.)> BTW, I really like Peter''s work on JKstat -- well done!Thanks! -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Richard Elling wrote: ...> Carson Gaspar wrote: > > Except sar sucks. It''s scheduled via cron, and is too coarse grained for > > many purposes (10 minute long samples average out almost everything > > interesting). > > There is a world of difference between the tools needed to perform > debugging and performance improvements vs long-term trending. sar > is a big, warty beast, but it works reasonably well for long-term > trending. The 3rd party tools like TeamQuest are more modern and > do a better job -- you get what you pay for.The problem is that even for long term trending you need better than 10 minute resolution, unless your app isn''t bursty at all, or you leave a _lot_ of headroom (or you only care about throughput and not latency). Sadly, most (but by no means all) 3rd party tools are resource hogs themselves, so aren''t very good for permanent resource utilization tracking (although they can be amazing at application performance debugging). One of the really cool things about dtrace is its extremely low perormance impact. Thus, my (trimmed in the quote) recommendation to write your own using kstat, as opposed to relying on sar. Or go buy something, but in my experience sar is unlikely to make you happy. -- Carson
Carson Gaspar wrote:> Richard Elling wrote: > ... > >> Carson Gaspar wrote: >> > Except sar sucks. It''s scheduled via cron, and is too coarse grained for >> > many purposes (10 minute long samples average out almost everything >> > interesting). >> >> There is a world of difference between the tools needed to perform >> debugging and performance improvements vs long-term trending. sar >> is a big, warty beast, but it works reasonably well for long-term >> trending. The 3rd party tools like TeamQuest are more modern and >> do a better job -- you get what you pay for. >> > > The problem is that even for long term trending you need better than 10 > minute resolution, unless your app isn''t bursty at all, or you leave a > _lot_ of headroom (or you only care about throughput and not latency). >By default, the crontab for sa1 (/var/spool/cron/crontabs/sys) sets 20-minute intervals. This can easily be changed to suit your needs.> Sadly, most (but by no means all) 3rd party tools are resource hogs > themselves, so aren''t very good for permanent resource utilization > tracking (although they can be amazing at application performance > debugging). One of the really cool things about dtrace is its extremely > low perormance impact. >Some dtrace scripts have very large, negative impacts on performance. However, I think for most modern systems, sar won''t have much impact. I dunno how all of the tools affect performance, however. I suspect it varies widely.> Thus, my (trimmed in the quote) recommendation to write your own using > kstat, as opposed to relying on sar. Or go buy something, but in my > experience sar is unlikely to make you happy. >sar has lots of problems, which is why there is a market for 3rd party capacity planning tools. OTOH, most of the others aren''t open source. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/sa/sar.c I should also mention Fenxi, an open source performance analysis engine we developed for analysis of performance experiments. Again, it doesn''t replace capacity planning tools, but it certainly makes experiments easier to manage. https://fenxi.dev.java.net/ I reserve my comments on SunMC for dimly lit bars... -- richard