thr3ads.net - dtrace discuss - [dtrace-discuss] expected system load from DTrace scripts/probes [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Michael Barrett

2005-Nov-18 15:14 UTC

[dtrace-discuss] expected system load from DTrace scripts/probes

Does anyone out there have any thoughts on the type of load common 
DTrace scripts would cause on a system if run 24x7?  I know "common 
DTrace scripts" and their underlining probe calls a vague statement.  So 
for the lack of a common and establish set of scripts in the OS, I''ll 
use the most popular right for my question...the DTraceToolkit from 
Brendan Gregg.  Which by the way are works of art.  Great job Mr. Gregg!

Has anyone out there noticed any consistent load numbers?  Looking at 
the extreme, what should one expect if all the scripts in the 
DTraceToolkit were always running in the background.

Thanks,
Mike

James Dickens

2005-Nov-18 16:15 UTC

head link

[dtrace-discuss] expected system load from DTrace scripts/probes

On 11/18/05, Michael Barrett <Michael.Barrett at sun.com>
wrote:>
> Does anyone out there have any thoughts on the type of load common
> DTrace scripts would cause on a system if run 24x7? I know "common
> DTrace scripts" and their underlining probe calls a vague statement.
So
> for the lack of a common and establish set of scripts in the OS,
I''ll
> use the most popular right for my question...the DTraceToolkit from
> Brendan Gregg. Which by the way are works of art. Great job Mr. Gregg!
>
> Has anyone out there noticed any consistent load numbers? Looking at
> the extreme, what should one expect if all the scripts in the
> DTraceToolkit were always running in the background.

depends on how many, and type of probes enabled. Some probes will only fire
once a minute or even less. While others fire 1000''s of times per
second
when the system is unload. You can also make probes less intrusive by making
there code execute less often.

James Dickens
uadmin.blogspot.com <http://uadmin.blogspot.com>


Thanks,> Mike
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20051118/bbcd2b83/attachment.html>

Wee Yeh Tan

2005-Nov-19 03:19 UTC

head link

[dtrace-discuss] expected system load from DTrace scripts/probes

On 11/18/05, Michael Barrett <Michael.Barrett at sun.com>
wrote:> Does anyone out there have any thoughts on the type of load common
> DTrace scripts would cause on a system if run 24x7?  I know "common
> DTrace scripts" and their underlining probe calls a vague statement. 
So
> for the lack of a common and establish set of scripts in the OS,
I''ll
> use the most popular right for my question...the DTraceToolkit from
> Brendan Gregg.  Which by the way are works of art.  Great job Mr. Gregg!
>
> Has anyone out there noticed any consistent load numbers?  Looking at
> the extreme, what should one expect if all the scripts in the
> DTraceToolkit were always running in the background.
The range will have to be anything from nearly 0 to something unimaginable.

Perhaps more interesting will be for us to put down the potential
impact of the various probes/actions so that users will not get caught
off guard -- an expansion to the performance section of the DTracing
Guide?

You get a pretty good idea of what to avoid in a production system
when you do it enough.  The pid provider, for example, can be very
intrusive (think about per instruction tracing).

Maybe we can dtrace the DTrace Toolkit to know the impact :P.

--
Just me,
Wire ...

Brendan Gregg

2005-Nov-19 08:35 UTC

head link

[dtrace-discuss] expected system load from DTrace scripts/probes

G''Day Folks,

On Sat, 19 Nov 2005, Wee Yeh Tan wrote:
> On 11/18/05, Michael Barrett <Michael.Barrett at sun.com> wrote:
> > Does anyone out there have any thoughts on the type of load common
> > DTrace scripts would cause on a system if run 24x7?  I know
"common
> > DTrace scripts" and their underlining probe calls a vague
statement.  So
> > for the lack of a common and establish set of scripts in the OS,
I''ll
> > use the most popular right for my question...the DTraceToolkit from
> > Brendan Gregg.  Which by the way are works of art.  Great job Mr.
Gregg!
Thanks Mike! :)

The performance impact is something I take very seriously, the last thing
I want to do is create performance problems while trying to solve them!

Most of the scripts consume less than 0.1% CPU, some of them 1% to 2%. A
few are open-ended, which will slog the system if you let them - but you
are also getting thousands of lines of output per second, which should be
a hint that you are examining too much!

The following is from the DTraceToolkit''s FAQ (Docs/Faq),

----
2.2. What performance effect can the DTraceToolkit cause?

   Enabling DTrace to monitor events has little effect on the system,
   especially when compared to the disruptive behaviour of truss (See
   http://www.brendangregg.com/DTrace/dtracevstruss.html for a comparison).

   It really boils down to how often the events occur that you are monitoring.
   The following numbers have been provided as a yardstick:

   1. Fixed rate scripts. For example, dispqlen.d samples at 1000 hz.
      The impact will be negligible, close to 0% CPU. (in testing, 0.1% CPU).

   2. Demand rated scripts. For example, iosnoop probes disk I/O events.
      The impact depends on the rate of events, for many servers the disk
      events would be slow enough for this to be less than 0.2% CPU.
      Scripts such as execsnoop would expect even fewer events, their impact
      would be close to 0.0% CPU. However scripts that monitor potentially
      very rapid events will have a greater impact, for example running
      dapptrace on Xorg (over 6000 lines of output per second) was
      consuming around 10% of a CPU to do so.

   3. Heavy voodoo scripts. A few scripts in the toolkit must probe either
      a ton of different events, or very rapid events, or both. They are
      going to hurt and there is no way around it. The worst would be
      cputimes and cpudists, they chew around 5% of the CPUs.

   There is an emphasis in the DTraceToolkit to write demand rated scripts
   that measure the fewest events, such that their impact is close to 0.0%
   CPU usage. Some scripts are fixed rate, which are safer as their impact
   has a known upper bound, and are most suitable to run in production.
---

I need to update that - cputimes/cpudists now chew less than 1% CPU.

A note on fixed rate scripts: these are profile related, such as
profile-1000hz. I plan to write a ton more of these, as they are both
light on the CPU and their impact on the system is fixed. I''ll draw
attention to them in the toolkit somehow.

[...]> You get a pretty good idea of what to avoid in a production system
> when you do it enough.  The pid provider, for example, can be very
> intrusive (think about per instruction tracing).
>
> Maybe we can dtrace the DTrace Toolkit to know the impact :P.
I do. :)

Read Docs/Notes/cputimes_notes.txt from the toolkit for one technique
to estimate script load. (It''s a bit hard to do, you need to be able to
create sustained workload while keeping some CPU idle - to be able to see
the reduction of idle time when the script runs).

What I like the most is to create a maximum practical workload, and
measure the tps (transactions per second) of the workload. Then run the
DTrace script and measure the reduction in tps.

For most of my DTrace scripts the tps difference is so small it is hard
to notice amongst the usual jitter. Which is good.

I should add that when I first began work on the toolkit I wrote scripts
based on various mental assumptions about performance. Then I started
benchmarking scripts and found that some of my assumptions were wrong!
Now I test the performance impact for every script. Writing the script is
often the easy part - most of my time goes into testing, followed by
documentation, and then script writing.

If I write a script in a truly bizzare way for performance reasons, I put
a comment in the code to explain this (eg, dtruss).

...

On the topic of running things 24x7: What statistics are you interested
in?

My performance monitoring strategy goes like this:

	1. Monitoring
		kstat:  sar, (+ custom tools)
		SNMP:   SunMC, mrtg, ...

	2. Identification
		kstat:  vmstat, iostat, mpstat, sysperfstat (K9Toolkit)
		procfs: prstat, prstat -m

	3. Analysis
		dtrace: DTraceToolkit, (+ custom scripts)

(I''m leaving a lot of areas out for simplification, eg libcpc). So
DTrace
is usually best at step 3.

There are some things DTrace provides that could be monitored 24x7 for
step 1. But, there may be workarounds using Kstat, SNMP or procfs (which
are already running); or, they may make good Kstat RFEs. (Not that I have
anything against DTrace of course, just picking the right tool for the job
:-)

cheers,

Brendan

[Sydney, Australia]

Wee Yeh Tan

2005-Nov-23 08:58 UTC

head link

[dtrace-discuss] expected system load from DTrace scripts/probes

Hey Brendan,

On 11/19/05, Brendan Gregg <brendan.gregg at tpg.com.au>
wrote:> Read Docs/Notes/cputimes_notes.txt from the toolkit for one technique
I promise myself to go through the Docs at some point... This gives me
a really good reason to look there instead of wow''ing myself with the
d-scripts.
> to estimate script load. (It''s a bit hard to do, you need to be
able to
> create sustained workload while keeping some CPU idle - to be able to see
> the reduction of idle time when the script runs).
Yeah... but I kept asking myself how such measurements can be useful
given the number of variables we need to fix -- and most of the time,
the variables cannot be fixed.  It at best gives a very rough
"ballpark" especially for scripts that are reacting to events.  Still,
I wonder if we can classify the scripts into very loose categories
like "unnoticable", "busy-system friendly", "do at your
own risk", or
"dun try this at home"...


--
Just me,
Wire ...

Brendan Gregg

2005-Dec-03 16:06 UTC

head link

[dtrace-discuss] expected system load from DTrace scripts/probes

G''Day Folks,

On Wed, 23 Nov 2005, Wee Yeh Tan wrote:
[...]> > to estimate script load. (It''s a bit hard to do, you need to
be able to
> > create sustained workload while keeping some CPU idle - to be able to
see
> > the reduction of idle time when the script runs).
>
> Yeah... but I kept asking myself how such measurements can be useful
> given the number of variables we need to fix -- and most of the time,
> the variables cannot be fixed.  It at best gives a very rough
> "ballpark" especially for scripts that are reacting to events.
Do you have in mind measurements that are useful?

If I eliminate all variables and measure the impact of DTrace down to the
clock-cycle, then saying it consumes 0.12 %CPU may be not be useful either
- in production that 0.12 %CPU may hammer the hardware caches for six; or
DTrace creating it''s per-CPU memory buffers may tilt the memory system
to
saturation.

The measurements are useful for me to know when I''m improving the
performance of code. When people run it on their own servers, it depends
on the rate of their events (usually). I''d agree that a ballpark is the
best we can give to others.
> Still,
> I wonder if we can classify the scripts into very loose categories
> like "unnoticable", "busy-system friendly", "do at
your own risk", or
> "dun try this at home"...
Yes, loose categories makes sense. Something like,

	unnoticable
			Especially for fixed rate scripts, such
			as with using profile:::. Around 0.1% loss.
	demand-low
			Depends on the demand, or the rate of events.
			Here we expect the events to be infrequent,
			such that the script is close to negligable.
			For example, execsnoop. 0 -> 0.5% loss.
	demand-high
			Depends on the demand, however here we expect
			the events to be frequent and the impact
			noticable. (very few scripts would be in this
			category. For example, cputimes). 1 -> 5% loss.
	violent
			Heavy voodoo scripts. For example, tracing
			every pid:::entry and pid:::return, and printing
			a line per event. Could well slow the application
			by 50%, but the hundreds of screens of output per
			second should be a hint you are doing too much.
			For example, dapptrace -Ua.

There are a number of documentation requests like this that I''ll try to
achieve, a list of them that have been suggested to me would be,

   * Performance impact per script, especially as a percentage.
   * A cheatsheet per script for what numbers are bad, what are good.
   * What all of the statitistics and values really mean - shortcomings,
     caveats, algorithms and interpretation.
   * Tutorials for solving various real world problems.
   * A checklist for the top 10 problems to look for.
   * Wizards that run the tools for you and present findings in English.
   * Animations of the DTraceToolkit in action solving problems.

Much of this is tough as it greatly depends on your environment, and what
is "bad" for your environment. I''d still like to try to write
documents
such as these.

I''m not sure I should encourage the expectation that I should be doing
this. How many of the above documents exist for standard tools such as
prstat, ps, vmstat, iostat, mpstat, sar, trapstat and lockstat?

cheers,

Brendan

Wee Yeh Tan

2005-Dec-05 05:24 UTC

head link

[dtrace-discuss] expected system load from DTrace scripts/probes

Brendan,

On 12/4/05, Brendan Gregg <brendan.gregg at tpg.com.au>
wrote:> Do you have in mind measurements that are useful?
I do not have anything specific :(.
> Yes, loose categories makes sense. Something like,
>         unnoticable
>         demand-low
>         demand-high
>         violent
On top of the loose categories, we can translate the firings to
something more layman.  E.g. we can say that "hotspot.d" is demand-low
but depends on the # of IOs, or dapptrace to depend on # of userland
calls for target pid.
> There are a number of documentation requests like this that I''ll
try to
> achieve, a list of them that have been suggested to me would be,
>
>    * Performance impact per script, especially as a percentage.
>    * A cheatsheet per script for what numbers are bad, what are good.
>    * What all of the statitistics and values really mean - shortcomings,
>      caveats, algorithms and interpretation.
>    * Tutorials for solving various real world problems.
>    * A checklist for the top 10 problems to look for.
>    * Wizards that run the tools for you and present findings in English.
>    * Animations of the DTraceToolkit in action solving problems.
The last had me laughing.


--
Just me,
Wire ...

Reasonably Related Threads

Search for more seemingly similar threads

dtrace discuss - Nov 2005 - expected system load from DTrace scripts/probes

[dtrace-discuss] expected system load from DTrace scripts/probes

[dtrace-discuss] expected system load from DTrace scripts/probes

[dtrace-discuss] expected system load from DTrace scripts/probes

[dtrace-discuss] expected system load from DTrace scripts/probes

[dtrace-discuss] expected system load from DTrace scripts/probes

[dtrace-discuss] expected system load from DTrace scripts/probes

[dtrace-discuss] expected system load from DTrace scripts/probes

Reasonably Related Threads