thr3ads.net - dtrace discuss - [dtrace-discuss] Tracking time spent in function -- does this do what I think it does? [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Nik Clayton

2006-Jul-31 13:37 UTC

[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?

This is pretty much my first DTrace script written ''in anger'',
as it
where, and I''d like a little feedback to make sure it''s doing
what I
think it''s doing.

The code that I''m instrumenting is the code that Sendmail runs to queue
up a message.  I''m throwing 10,000 messages at Sendmail, and I want to 
keep track of the time this takes.  The theory is that as Sendmail has 
to queue more messages, the time taken to queue new messages is going to 
go up.

Probably due to UFS issues with large directories, but that''s a second 
phase.

The script looks like this:

-------- 8< -------- 8< -------- 8< -------- 8< -------- 8<
-------- 8<
#!/usr/sbin/dtrace -s

int divisor;

BEGIN
{
     divisor = 100;
     start   = timestamp;  /* Get the start time */
}

/* Keep a count of how many times this function is called, and note the
    time when it was called */
sendmail*:::queue-createentry-start
/zonename == $1/
{
     c++;
     self->start = timestamp;
}

/* Take the function count, divide by divisor to generate a bucket
    number to index in to @time.  Store the average run time of the
    function in the bucket. */
sendmail*:::queue-createentry-end
/self->start && zonename == $1/
{
     @time[c / divisor] = avg((timestamp - self->start) / 1000);
     self->start = 0;
}
-------- 8< -------- 8< -------- 8< -------- 8< -------- 8<
-------- 8<

''queue-createentry-start'' and
''queue-createentry-end'' are two custom
probes.  And I''m running this in a zone, so the usage is (from the 
global zone)

   # queue-create.d name-of-zone

That produces output like this (if I run it through sort -n)


                 0             9373
                 1             9857
                 2            26820
                 3            15429
                 ...            ...

which, if I''m reading it right, tells me that the messages 0-99 took,
on
average, 9373ms to queue, messages 100-199 took on average 9857ms to 
queue, and so on.

Does that look right?  Is there a more DTrace-ish way to do this?

In my ideal world quantize() or similar would work, but after reading 
the docs I can''t quite see how I''d get it to produce sensible
output
from this data.

N

Bryan Cantrill

2006-Jul-31 16:37 UTC

head link

[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?

On Mon, Jul 31, 2006 at 02:37:26PM +0100, Nik Clayton
wrote:> This is pretty much my first DTrace script written ''in
anger'', as it
> where, and I''d like a little feedback to make sure it''s
doing what I
> think it''s doing.
> 
> The code that I''m instrumenting is the code that Sendmail runs to
queue
> up a message.  I''m throwing 10,000 messages at Sendmail, and I
want to
> keep track of the time this takes.  The theory is that as Sendmail has 
> to queue more messages, the time taken to queue new messages is going to 
> go up.
> 
> Probably due to UFS issues with large directories, but that''s a
second
> phase.
> 
> The script looks like this:
> 
> -------- 8< -------- 8< -------- 8< -------- 8< -------- 8<
-------- 8<
> #!/usr/sbin/dtrace -s
> 
> int divisor;
> 
> BEGIN
> {
>     divisor = 100;
>     start   = timestamp;  /* Get the start time */
> }
> 
> /* Keep a count of how many times this function is called, and note the
>    time when it was called */
> sendmail*:::queue-createentry-start
> /zonename == $1/
> {
>     c++;
>     self->start = timestamp;
> }
> 
> /* Take the function count, divide by divisor to generate a bucket
>    number to index in to @time.  Store the average run time of the
>    function in the bucket. */
> sendmail*:::queue-createentry-end
> /self->start && zonename == $1/
> {
>     @time[c / divisor] = avg((timestamp - self->start) / 1000);
>     self->start = 0;
> }
> -------- 8< -------- 8< -------- 8< -------- 8< -------- 8<
-------- 8<
> 
> ''queue-createentry-start'' and
''queue-createentry-end'' are two custom
> probes.  And I''m running this in a zone, so the usage is (from the
> global zone)
My only comment on this would be to put the increment of c in the clause
for queue-createentry-end.  Using global variables is inherently racey,
and you want to minimize that race as much as possible.
> That produces output like this (if I run it through sort -n)
> 
> 
>                 0             9373
>                 1             9857
>                 2            26820
>                 3            15429
>                 ...            ...
> 
> which, if I''m reading it right, tells me that the messages 0-99
took, on
> average, 9373ms to queue, messages 100-199 took on average 9857ms to 
> queue, and so on.
Not quite -- they took 9373us, not 9373ms:  ''timestamp'' is in
nanoseconds,
so dividing it by a thousand yields microseconds, not milliseconds.
> Does that look right?  Is there a more DTrace-ish way to do this?
> In my ideal world quantize() or similar would work, but after reading 
> the docs I can''t quite see how I''d get it to produce
sensible output
> from this data.
It depends on exactly what you''re looking for, but you could just use
quantize() straight-up:

	@time[c / divisor] = quantize((timestamp - self->start) / 1000);

This will give you a histogram for each period.  You then sort by key
(set the ''aggsortkey'' option) to see the histograms in time
order.
This is what I''d recommend, as I think it answers your question.

If, however, you''re not at all interested in the distribution, you
could
do this:

	@time = lquantize(c / divisor, 0, 1000, 1,
	    (timestamp - self->start) / divisor);

This is a bit peculiar in that it''s a little racey, that it limits 
(necessarily) the number of groups of messages (to 1000 in the above), that
you have to read the average number of nanoseconds for each group by looking
at the ''count'' column of the lquantize output, and that the
output on the
last bucket might be a bit wrong.  But it might get you to where you want go...

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

Nik Clayton

2006-Aug-01 09:04 UTC

head link

[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?

Bryan Cantrill wrote:> On Mon, Jul 31, 2006 at 02:37:26PM +0100, Nik Clayton wrote:
> My only comment on this would be to put the increment of c in the clause
> for queue-createentry-end.  Using global variables is inherently racey,
> and you want to minimize that race as much as possible.
Done.  Useful tip, thanks.
>> That produces output like this (if I run it through sort -n)
>>
>>
>>                 0             9373
>>                 1             9857
>>                 2            26820
>>                 3            15429
>>                 ...            ...
>>
>> which, if I''m reading it right, tells me that the messages
0-99 took, on
>> average, 9373ms to queue, messages 100-199 took on average 9857ms to 
>> queue, and so on.
> 
> Not quite -- they took 9373us, not 9373ms:  ''timestamp''
is in nanoseconds,
> so dividing it by a thousand yields microseconds, not milliseconds.
Ah yes.  Slip of the tongue.
>> Does that look right?  Is there a more DTrace-ish way to do this?
> 
>> In my ideal world quantize() or similar would work, but after reading 
>> the docs I can''t quite see how I''d get it to produce
sensible output
>> from this data.
> 
> It depends on exactly what you''re looking for, but you could just
use
> quantize() straight-up:
> 
> 	@time[c / divisor] = quantize((timestamp - self->start) / 1000);
> 
> This will give you a histogram for each period.  You then sort by key
> (set the ''aggsortkey'' option) to see the histograms in
time order.
> This is what I''d recommend, as I think it answers your question.
I tried that, and it gives me one histogram per bucket (which makes 
sense, from reading the documentation).  What I''ve been trying to do is
get one histogram, where the left hand side is the bucket number, and 
the right hand side is the average uSec for that bucket.

Something like this:

     bucket  ------------- Average --------------- value
          0 | @@@@@@@@@                            9373
          1 | @@@@@@@@@@                           9857
          2 | @@@@@@@@@@@@@@@@@@@@@@@@@@@          26820
          3 | @@@@@@@@@@@@@@@                      15429
          ...

But that''s eluding me at the moment.
> If, however, you''re not at all interested in the distribution, you
could
> do this:
> 
> 	@time = lquantize(c / divisor, 0, 1000, 1,
> 	    (timestamp - self->start) / divisor);
> 
> This is a bit peculiar in that it''s a little racey, that it limits
> (necessarily) the number of groups of messages (to 1000 in the above), that
> you have to read the average number of nanoseconds for each group by
looking
> at the ''count'' column of the lquantize output, and that
the output on the
> last bucket might be a bit wrong.  But it might get you to where you want
go...
This is a bit odd.  I''ve tried that, and it seems close to what
I''m
looking for.  On a test run I get:

     value  --------------- Distribution --------------- count
       < 0 |                                             0
         0 |                                             1095282
         1 |                                             1096254
         2 |                                             1296317
         3 |                                             724605
         ...

which looks like it might be close.  A couple of questions though.

First, why isn''t dtrace drawing the ''@'' graph? 
I''ve checked the rest of
the values and there''s nothing ludicrously large that might be scaling 
the distribution plot so that the figures I''ve quoted above are too 
small to plot.

Also, what''s the correct interpretation of the
''count'' column?  Is that
the average for that bucket?  The total?  The highest figure?

N

Bryan Cantrill

2006-Aug-01 16:09 UTC

head link

[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?

> I tried that, and it gives me one histogram per bucket (which makes 
> sense, from reading the documentation).  What I''ve been trying to
do is
> get one histogram, where the left hand side is the bucket number, and 
> the right hand side is the average uSec for that bucket.
> 
> Something like this:
> 
>     bucket  ------------- Average --------------- value
>          0 | @@@@@@@@@                            9373
>          1 | @@@@@@@@@@                           9857
>          2 | @@@@@@@@@@@@@@@@@@@@@@@@@@@          26820
>          3 | @@@@@@@@@@@@@@@                      15429
>          ...
> 
> But that''s eluding me at the moment.
> 
> >If, however, you''re not at all interested in the distribution,
you could
> >do this:
> >
> >	@time = lquantize(c / divisor, 0, 1000, 1,
> >	    (timestamp - self->start) / divisor);
> >
> >This is a bit peculiar in that it''s a little racey, that it
limits
> >(necessarily) the number of groups of messages (to 1000 in the above),
that
> >you have to read the average number of nanoseconds for each group by 
> >looking
> >at the ''count'' column of the lquantize output, and
that the output on the
> >last bucket might be a bit wrong.  But it might get you to where you
want
> >go...
> 
> This is a bit odd.  I''ve tried that, and it seems close to what
I''m
> looking for.  On a test run I get:
> 
>     value  --------------- Distribution --------------- count
>       < 0 |                                             0
>         0 |                                             1095282
>         1 |                                             1096254
>         2 |                                             1296317
>         3 |                                             724605
>         ...
> 
> which looks like it might be close.  A couple of questions though.
> 
> First, why isn''t dtrace drawing the ''@'' graph? 
I''ve checked the rest of
> the values and there''s nothing ludicrously large that might be
scaling
> the distribution plot so that the figures I''ve quoted above are
too
> small to plot.
It is, it''s just the algorithm for drawing it that''s a bit
counterintuitive:
to determine the number of ''@'' symbols to draw, DTrace takes
the number
of ''@'' symbols in the width of the histogram, and divides that
up among the
buckets according to their fraction of the total.  The problem comes when
you have many buckets (as in this lquantize()), and those buckets have
approximately equal weight.  In this case, buckets will have a fraction
of an ''@'' symbol -- and that will get whacked to be zero
''@'' symbols.
(All of this is not to defend the current scheme, but merely to describe
it; a more sophisticated scheme that looks at standard deviation and
gives mean buckets half a bar of ''@'' symbols would almost
certainly be
superior -- though I think one could argue that the extant scheme should be 
retained for quantize() which tends to be less pathological in this regard.)
> Also, what''s the correct interpretation of the
''count'' column?  Is that
> the average for that bucket?  The total?  The highest figure?
It''s the average:  each bucket contains the sum of each value divided 
by the number of values in the bucket.

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

dtrace discuss - Jul 2006 - Tracking time spent in function -- does this do what I think it does?

[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?

[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?

[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?

[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?