Michael Barrett
2005-Nov-18 15:14 UTC
[dtrace-discuss] expected system load from DTrace scripts/probes
Does anyone out there have any thoughts on the type of load common DTrace scripts would cause on a system if run 24x7? I know "common DTrace scripts" and their underlining probe calls a vague statement. So for the lack of a common and establish set of scripts in the OS, I''ll use the most popular right for my question...the DTraceToolkit from Brendan Gregg. Which by the way are works of art. Great job Mr. Gregg! Has anyone out there noticed any consistent load numbers? Looking at the extreme, what should one expect if all the scripts in the DTraceToolkit were always running in the background. Thanks, Mike
James Dickens
2005-Nov-18 16:15 UTC
[dtrace-discuss] expected system load from DTrace scripts/probes
On 11/18/05, Michael Barrett <Michael.Barrett at sun.com> wrote:> > Does anyone out there have any thoughts on the type of load common > DTrace scripts would cause on a system if run 24x7? I know "common > DTrace scripts" and their underlining probe calls a vague statement. So > for the lack of a common and establish set of scripts in the OS, I''ll > use the most popular right for my question...the DTraceToolkit from > Brendan Gregg. Which by the way are works of art. Great job Mr. Gregg! > > Has anyone out there noticed any consistent load numbers? Looking at > the extreme, what should one expect if all the scripts in the > DTraceToolkit were always running in the background.depends on how many, and type of probes enabled. Some probes will only fire once a minute or even less. While others fire 1000''s of times per second when the system is unload. You can also make probes less intrusive by making there code execute less often. James Dickens uadmin.blogspot.com <http://uadmin.blogspot.com> Thanks,> Mike > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20051118/bbcd2b83/attachment.html>
Wee Yeh Tan
2005-Nov-19 03:19 UTC
[dtrace-discuss] expected system load from DTrace scripts/probes
On 11/18/05, Michael Barrett <Michael.Barrett at sun.com> wrote:> Does anyone out there have any thoughts on the type of load common > DTrace scripts would cause on a system if run 24x7? I know "common > DTrace scripts" and their underlining probe calls a vague statement. So > for the lack of a common and establish set of scripts in the OS, I''ll > use the most popular right for my question...the DTraceToolkit from > Brendan Gregg. Which by the way are works of art. Great job Mr. Gregg! > > Has anyone out there noticed any consistent load numbers? Looking at > the extreme, what should one expect if all the scripts in the > DTraceToolkit were always running in the background.The range will have to be anything from nearly 0 to something unimaginable. Perhaps more interesting will be for us to put down the potential impact of the various probes/actions so that users will not get caught off guard -- an expansion to the performance section of the DTracing Guide? You get a pretty good idea of what to avoid in a production system when you do it enough. The pid provider, for example, can be very intrusive (think about per instruction tracing). Maybe we can dtrace the DTrace Toolkit to know the impact :P. -- Just me, Wire ...
Brendan Gregg
2005-Nov-19 08:35 UTC
[dtrace-discuss] expected system load from DTrace scripts/probes
G''Day Folks, On Sat, 19 Nov 2005, Wee Yeh Tan wrote:> On 11/18/05, Michael Barrett <Michael.Barrett at sun.com> wrote: > > Does anyone out there have any thoughts on the type of load common > > DTrace scripts would cause on a system if run 24x7? I know "common > > DTrace scripts" and their underlining probe calls a vague statement. So > > for the lack of a common and establish set of scripts in the OS, I''ll > > use the most popular right for my question...the DTraceToolkit from > > Brendan Gregg. Which by the way are works of art. Great job Mr. Gregg!Thanks Mike! :) The performance impact is something I take very seriously, the last thing I want to do is create performance problems while trying to solve them! Most of the scripts consume less than 0.1% CPU, some of them 1% to 2%. A few are open-ended, which will slog the system if you let them - but you are also getting thousands of lines of output per second, which should be a hint that you are examining too much! The following is from the DTraceToolkit''s FAQ (Docs/Faq), ---- 2.2. What performance effect can the DTraceToolkit cause? Enabling DTrace to monitor events has little effect on the system, especially when compared to the disruptive behaviour of truss (See http://www.brendangregg.com/DTrace/dtracevstruss.html for a comparison). It really boils down to how often the events occur that you are monitoring. The following numbers have been provided as a yardstick: 1. Fixed rate scripts. For example, dispqlen.d samples at 1000 hz. The impact will be negligible, close to 0% CPU. (in testing, 0.1% CPU). 2. Demand rated scripts. For example, iosnoop probes disk I/O events. The impact depends on the rate of events, for many servers the disk events would be slow enough for this to be less than 0.2% CPU. Scripts such as execsnoop would expect even fewer events, their impact would be close to 0.0% CPU. However scripts that monitor potentially very rapid events will have a greater impact, for example running dapptrace on Xorg (over 6000 lines of output per second) was consuming around 10% of a CPU to do so. 3. Heavy voodoo scripts. A few scripts in the toolkit must probe either a ton of different events, or very rapid events, or both. They are going to hurt and there is no way around it. The worst would be cputimes and cpudists, they chew around 5% of the CPUs. There is an emphasis in the DTraceToolkit to write demand rated scripts that measure the fewest events, such that their impact is close to 0.0% CPU usage. Some scripts are fixed rate, which are safer as their impact has a known upper bound, and are most suitable to run in production. --- I need to update that - cputimes/cpudists now chew less than 1% CPU. A note on fixed rate scripts: these are profile related, such as profile-1000hz. I plan to write a ton more of these, as they are both light on the CPU and their impact on the system is fixed. I''ll draw attention to them in the toolkit somehow. [...]> You get a pretty good idea of what to avoid in a production system > when you do it enough. The pid provider, for example, can be very > intrusive (think about per instruction tracing). > > Maybe we can dtrace the DTrace Toolkit to know the impact :P.I do. :) Read Docs/Notes/cputimes_notes.txt from the toolkit for one technique to estimate script load. (It''s a bit hard to do, you need to be able to create sustained workload while keeping some CPU idle - to be able to see the reduction of idle time when the script runs). What I like the most is to create a maximum practical workload, and measure the tps (transactions per second) of the workload. Then run the DTrace script and measure the reduction in tps. For most of my DTrace scripts the tps difference is so small it is hard to notice amongst the usual jitter. Which is good. I should add that when I first began work on the toolkit I wrote scripts based on various mental assumptions about performance. Then I started benchmarking scripts and found that some of my assumptions were wrong! Now I test the performance impact for every script. Writing the script is often the easy part - most of my time goes into testing, followed by documentation, and then script writing. If I write a script in a truly bizzare way for performance reasons, I put a comment in the code to explain this (eg, dtruss). ... On the topic of running things 24x7: What statistics are you interested in? My performance monitoring strategy goes like this: 1. Monitoring kstat: sar, (+ custom tools) SNMP: SunMC, mrtg, ... 2. Identification kstat: vmstat, iostat, mpstat, sysperfstat (K9Toolkit) procfs: prstat, prstat -m 3. Analysis dtrace: DTraceToolkit, (+ custom scripts) (I''m leaving a lot of areas out for simplification, eg libcpc). So DTrace is usually best at step 3. There are some things DTrace provides that could be monitored 24x7 for step 1. But, there may be workarounds using Kstat, SNMP or procfs (which are already running); or, they may make good Kstat RFEs. (Not that I have anything against DTrace of course, just picking the right tool for the job :-) cheers, Brendan [Sydney, Australia]
Wee Yeh Tan
2005-Nov-23 08:58 UTC
[dtrace-discuss] expected system load from DTrace scripts/probes
Hey Brendan, On 11/19/05, Brendan Gregg <brendan.gregg at tpg.com.au> wrote:> Read Docs/Notes/cputimes_notes.txt from the toolkit for one techniqueI promise myself to go through the Docs at some point... This gives me a really good reason to look there instead of wow''ing myself with the d-scripts.> to estimate script load. (It''s a bit hard to do, you need to be able to > create sustained workload while keeping some CPU idle - to be able to see > the reduction of idle time when the script runs).Yeah... but I kept asking myself how such measurements can be useful given the number of variables we need to fix -- and most of the time, the variables cannot be fixed. It at best gives a very rough "ballpark" especially for scripts that are reacting to events. Still, I wonder if we can classify the scripts into very loose categories like "unnoticable", "busy-system friendly", "do at your own risk", or "dun try this at home"... -- Just me, Wire ...
Brendan Gregg
2005-Dec-03 16:06 UTC
[dtrace-discuss] expected system load from DTrace scripts/probes
G''Day Folks, On Wed, 23 Nov 2005, Wee Yeh Tan wrote: [...]> > to estimate script load. (It''s a bit hard to do, you need to be able to > > create sustained workload while keeping some CPU idle - to be able to see > > the reduction of idle time when the script runs). > > Yeah... but I kept asking myself how such measurements can be useful > given the number of variables we need to fix -- and most of the time, > the variables cannot be fixed. It at best gives a very rough > "ballpark" especially for scripts that are reacting to events.Do you have in mind measurements that are useful? If I eliminate all variables and measure the impact of DTrace down to the clock-cycle, then saying it consumes 0.12 %CPU may be not be useful either - in production that 0.12 %CPU may hammer the hardware caches for six; or DTrace creating it''s per-CPU memory buffers may tilt the memory system to saturation. The measurements are useful for me to know when I''m improving the performance of code. When people run it on their own servers, it depends on the rate of their events (usually). I''d agree that a ballpark is the best we can give to others.> Still, > I wonder if we can classify the scripts into very loose categories > like "unnoticable", "busy-system friendly", "do at your own risk", or > "dun try this at home"...Yes, loose categories makes sense. Something like, unnoticable Especially for fixed rate scripts, such as with using profile:::. Around 0.1% loss. demand-low Depends on the demand, or the rate of events. Here we expect the events to be infrequent, such that the script is close to negligable. For example, execsnoop. 0 -> 0.5% loss. demand-high Depends on the demand, however here we expect the events to be frequent and the impact noticable. (very few scripts would be in this category. For example, cputimes). 1 -> 5% loss. violent Heavy voodoo scripts. For example, tracing every pid:::entry and pid:::return, and printing a line per event. Could well slow the application by 50%, but the hundreds of screens of output per second should be a hint you are doing too much. For example, dapptrace -Ua. There are a number of documentation requests like this that I''ll try to achieve, a list of them that have been suggested to me would be, * Performance impact per script, especially as a percentage. * A cheatsheet per script for what numbers are bad, what are good. * What all of the statitistics and values really mean - shortcomings, caveats, algorithms and interpretation. * Tutorials for solving various real world problems. * A checklist for the top 10 problems to look for. * Wizards that run the tools for you and present findings in English. * Animations of the DTraceToolkit in action solving problems. Much of this is tough as it greatly depends on your environment, and what is "bad" for your environment. I''d still like to try to write documents such as these. I''m not sure I should encourage the expectation that I should be doing this. How many of the above documents exist for standard tools such as prstat, ps, vmstat, iostat, mpstat, sar, trapstat and lockstat? cheers, Brendan
Wee Yeh Tan
2005-Dec-05 05:24 UTC
[dtrace-discuss] expected system load from DTrace scripts/probes
Brendan, On 12/4/05, Brendan Gregg <brendan.gregg at tpg.com.au> wrote:> Do you have in mind measurements that are useful?I do not have anything specific :(.> Yes, loose categories makes sense. Something like, > unnoticable > demand-low > demand-high > violentOn top of the loose categories, we can translate the firings to something more layman. E.g. we can say that "hotspot.d" is demand-low but depends on the # of IOs, or dapptrace to depend on # of userland calls for target pid.> There are a number of documentation requests like this that I''ll try to > achieve, a list of them that have been suggested to me would be, > > * Performance impact per script, especially as a percentage. > * A cheatsheet per script for what numbers are bad, what are good. > * What all of the statitistics and values really mean - shortcomings, > caveats, algorithms and interpretation. > * Tutorials for solving various real world problems. > * A checklist for the top 10 problems to look for. > * Wizards that run the tools for you and present findings in English. > * Animations of the DTraceToolkit in action solving problems.The last had me laughing. -- Just me, Wire ...