Nik Clayton
2006-Jul-31 13:37 UTC
[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?
This is pretty much my first DTrace script written ''in anger'', as it where, and I''d like a little feedback to make sure it''s doing what I think it''s doing. The code that I''m instrumenting is the code that Sendmail runs to queue up a message. I''m throwing 10,000 messages at Sendmail, and I want to keep track of the time this takes. The theory is that as Sendmail has to queue more messages, the time taken to queue new messages is going to go up. Probably due to UFS issues with large directories, but that''s a second phase. The script looks like this: -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< #!/usr/sbin/dtrace -s int divisor; BEGIN { divisor = 100; start = timestamp; /* Get the start time */ } /* Keep a count of how many times this function is called, and note the time when it was called */ sendmail*:::queue-createentry-start /zonename == $1/ { c++; self->start = timestamp; } /* Take the function count, divide by divisor to generate a bucket number to index in to @time. Store the average run time of the function in the bucket. */ sendmail*:::queue-createentry-end /self->start && zonename == $1/ { @time[c / divisor] = avg((timestamp - self->start) / 1000); self->start = 0; } -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< ''queue-createentry-start'' and ''queue-createentry-end'' are two custom probes. And I''m running this in a zone, so the usage is (from the global zone) # queue-create.d name-of-zone That produces output like this (if I run it through sort -n) 0 9373 1 9857 2 26820 3 15429 ... ... which, if I''m reading it right, tells me that the messages 0-99 took, on average, 9373ms to queue, messages 100-199 took on average 9857ms to queue, and so on. Does that look right? Is there a more DTrace-ish way to do this? In my ideal world quantize() or similar would work, but after reading the docs I can''t quite see how I''d get it to produce sensible output from this data. N
Bryan Cantrill
2006-Jul-31 16:37 UTC
[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?
On Mon, Jul 31, 2006 at 02:37:26PM +0100, Nik Clayton wrote:> This is pretty much my first DTrace script written ''in anger'', as it > where, and I''d like a little feedback to make sure it''s doing what I > think it''s doing. > > The code that I''m instrumenting is the code that Sendmail runs to queue > up a message. I''m throwing 10,000 messages at Sendmail, and I want to > keep track of the time this takes. The theory is that as Sendmail has > to queue more messages, the time taken to queue new messages is going to > go up. > > Probably due to UFS issues with large directories, but that''s a second > phase. > > The script looks like this: > > -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< > #!/usr/sbin/dtrace -s > > int divisor; > > BEGIN > { > divisor = 100; > start = timestamp; /* Get the start time */ > } > > /* Keep a count of how many times this function is called, and note the > time when it was called */ > sendmail*:::queue-createentry-start > /zonename == $1/ > { > c++; > self->start = timestamp; > } > > /* Take the function count, divide by divisor to generate a bucket > number to index in to @time. Store the average run time of the > function in the bucket. */ > sendmail*:::queue-createentry-end > /self->start && zonename == $1/ > { > @time[c / divisor] = avg((timestamp - self->start) / 1000); > self->start = 0; > } > -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< -------- 8< > > ''queue-createentry-start'' and ''queue-createentry-end'' are two custom > probes. And I''m running this in a zone, so the usage is (from the > global zone)My only comment on this would be to put the increment of c in the clause for queue-createentry-end. Using global variables is inherently racey, and you want to minimize that race as much as possible.> That produces output like this (if I run it through sort -n) > > > 0 9373 > 1 9857 > 2 26820 > 3 15429 > ... ... > > which, if I''m reading it right, tells me that the messages 0-99 took, on > average, 9373ms to queue, messages 100-199 took on average 9857ms to > queue, and so on.Not quite -- they took 9373us, not 9373ms: ''timestamp'' is in nanoseconds, so dividing it by a thousand yields microseconds, not milliseconds.> Does that look right? Is there a more DTrace-ish way to do this?> In my ideal world quantize() or similar would work, but after reading > the docs I can''t quite see how I''d get it to produce sensible output > from this data.It depends on exactly what you''re looking for, but you could just use quantize() straight-up: @time[c / divisor] = quantize((timestamp - self->start) / 1000); This will give you a histogram for each period. You then sort by key (set the ''aggsortkey'' option) to see the histograms in time order. This is what I''d recommend, as I think it answers your question. If, however, you''re not at all interested in the distribution, you could do this: @time = lquantize(c / divisor, 0, 1000, 1, (timestamp - self->start) / divisor); This is a bit peculiar in that it''s a little racey, that it limits (necessarily) the number of groups of messages (to 1000 in the above), that you have to read the average number of nanoseconds for each group by looking at the ''count'' column of the lquantize output, and that the output on the last bucket might be a bit wrong. But it might get you to where you want go... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
Nik Clayton
2006-Aug-01 09:04 UTC
[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?
Bryan Cantrill wrote:> On Mon, Jul 31, 2006 at 02:37:26PM +0100, Nik Clayton wrote: > My only comment on this would be to put the increment of c in the clause > for queue-createentry-end. Using global variables is inherently racey, > and you want to minimize that race as much as possible.Done. Useful tip, thanks.>> That produces output like this (if I run it through sort -n) >> >> >> 0 9373 >> 1 9857 >> 2 26820 >> 3 15429 >> ... ... >> >> which, if I''m reading it right, tells me that the messages 0-99 took, on >> average, 9373ms to queue, messages 100-199 took on average 9857ms to >> queue, and so on. > > Not quite -- they took 9373us, not 9373ms: ''timestamp'' is in nanoseconds, > so dividing it by a thousand yields microseconds, not milliseconds.Ah yes. Slip of the tongue.>> Does that look right? Is there a more DTrace-ish way to do this? > >> In my ideal world quantize() or similar would work, but after reading >> the docs I can''t quite see how I''d get it to produce sensible output >> from this data. > > It depends on exactly what you''re looking for, but you could just use > quantize() straight-up: > > @time[c / divisor] = quantize((timestamp - self->start) / 1000); > > This will give you a histogram for each period. You then sort by key > (set the ''aggsortkey'' option) to see the histograms in time order. > This is what I''d recommend, as I think it answers your question.I tried that, and it gives me one histogram per bucket (which makes sense, from reading the documentation). What I''ve been trying to do is get one histogram, where the left hand side is the bucket number, and the right hand side is the average uSec for that bucket. Something like this: bucket ------------- Average --------------- value 0 | @@@@@@@@@ 9373 1 | @@@@@@@@@@ 9857 2 | @@@@@@@@@@@@@@@@@@@@@@@@@@@ 26820 3 | @@@@@@@@@@@@@@@ 15429 ... But that''s eluding me at the moment.> If, however, you''re not at all interested in the distribution, you could > do this: > > @time = lquantize(c / divisor, 0, 1000, 1, > (timestamp - self->start) / divisor); > > This is a bit peculiar in that it''s a little racey, that it limits > (necessarily) the number of groups of messages (to 1000 in the above), that > you have to read the average number of nanoseconds for each group by looking > at the ''count'' column of the lquantize output, and that the output on the > last bucket might be a bit wrong. But it might get you to where you want go...This is a bit odd. I''ve tried that, and it seems close to what I''m looking for. On a test run I get: value --------------- Distribution --------------- count < 0 | 0 0 | 1095282 1 | 1096254 2 | 1296317 3 | 724605 ... which looks like it might be close. A couple of questions though. First, why isn''t dtrace drawing the ''@'' graph? I''ve checked the rest of the values and there''s nothing ludicrously large that might be scaling the distribution plot so that the figures I''ve quoted above are too small to plot. Also, what''s the correct interpretation of the ''count'' column? Is that the average for that bucket? The total? The highest figure? N
Bryan Cantrill
2006-Aug-01 16:09 UTC
[dtrace-discuss] Tracking time spent in function -- does this do what I think it does?
> I tried that, and it gives me one histogram per bucket (which makes > sense, from reading the documentation). What I''ve been trying to do is > get one histogram, where the left hand side is the bucket number, and > the right hand side is the average uSec for that bucket. > > Something like this: > > bucket ------------- Average --------------- value > 0 | @@@@@@@@@ 9373 > 1 | @@@@@@@@@@ 9857 > 2 | @@@@@@@@@@@@@@@@@@@@@@@@@@@ 26820 > 3 | @@@@@@@@@@@@@@@ 15429 > ... > > But that''s eluding me at the moment. > > >If, however, you''re not at all interested in the distribution, you could > >do this: > > > > @time = lquantize(c / divisor, 0, 1000, 1, > > (timestamp - self->start) / divisor); > > > >This is a bit peculiar in that it''s a little racey, that it limits > >(necessarily) the number of groups of messages (to 1000 in the above), that > >you have to read the average number of nanoseconds for each group by > >looking > >at the ''count'' column of the lquantize output, and that the output on the > >last bucket might be a bit wrong. But it might get you to where you want > >go... > > This is a bit odd. I''ve tried that, and it seems close to what I''m > looking for. On a test run I get: > > value --------------- Distribution --------------- count > < 0 | 0 > 0 | 1095282 > 1 | 1096254 > 2 | 1296317 > 3 | 724605 > ... > > which looks like it might be close. A couple of questions though. > > First, why isn''t dtrace drawing the ''@'' graph? I''ve checked the rest of > the values and there''s nothing ludicrously large that might be scaling > the distribution plot so that the figures I''ve quoted above are too > small to plot.It is, it''s just the algorithm for drawing it that''s a bit counterintuitive: to determine the number of ''@'' symbols to draw, DTrace takes the number of ''@'' symbols in the width of the histogram, and divides that up among the buckets according to their fraction of the total. The problem comes when you have many buckets (as in this lquantize()), and those buckets have approximately equal weight. In this case, buckets will have a fraction of an ''@'' symbol -- and that will get whacked to be zero ''@'' symbols. (All of this is not to defend the current scheme, but merely to describe it; a more sophisticated scheme that looks at standard deviation and gives mean buckets half a bar of ''@'' symbols would almost certainly be superior -- though I think one could argue that the extant scheme should be retained for quantize() which tends to be less pathological in this regard.)> Also, what''s the correct interpretation of the ''count'' column? Is that > the average for that bucket? The total? The highest figure?It''s the average: each bucket contains the sum of each value divided by the number of values in the bucket. - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc