Paul van den Bogaard
2007-Feb-15 12:49 UTC
[dtrace-discuss] profile provider: is it me doing stupid things?
Just showing someone how great DTrace is and then we spot something I do not understand. Of course it can be a major misunderstanding of myself. Would appreciate another (expert) look upon this. The goal we try to achieve is trying to get insight if there is a bursty nature in the time slot when system calls are done. Following DTrace snippet is tried: dtrace -q -p 3173 -n ''syscall::read:entry { @[probefunc] = count(); } profile:::tick-250ms { printa("%6s %@6d\n", @); printf("\n"); clear(@); }'' This to chop up a one second interval into four different equal sized 250 millsec slots. Counting the number of read and write syscalls that fall in each slot. This one is run against a process that reads a single character and writes that character (stdin, stdout) continuously. Wrote it myself so I dare say it is functionally correct :-) The output I get is: read 15078 read 0 read 0 read 0 And this pattern continues till hell freezes over: first slot has all the values, others are always zero. Tried this with 200msec and 100msec intervals too. No difference. Mmmm I expect more of a uniform distribution. Lets try to sample in DTrace a little different. In the end it might be an unforeseen side effect of the application :) The following DTrace script was used: BEGIN { baseline = timestamp; } syscall::read:entry /pid == $target/ { this->tmp = (timestamp - baseline) / (1000 * 1000 * 250); @[this->tmp] = count(); } profile:::tick-1000ms { baseline = timestamp; printa("%3d %@u\n", @); clear(@); } END { } And this one does show a uniform distribution of the system calls as I expected: 3 10601 2 10610 1 10940 0 11534 Question: why is the profile:::tick-xxxmsec behaving this way? What am I missing? Thanks Paul This message posted from opensolaris.org
Chip Bennett
2007-Feb-15 16:04 UTC
[dtrace-discuss] profile provider: is it me doing stupid things?
Paul, I haven''t checked this by experiment, but you''re clearing the aggregation values at four times the default aggrate (1 second), which is how often the dtrace command consumes the aggregation data. You might try changing the aggrate to be the same as the tick. Chip Paul van den Bogaard wrote:> Just showing someone how great DTrace is and then we spot something I do not understand. > Of course it can be a major misunderstanding of myself. Would appreciate another (expert) look upon this. > > The goal we try to achieve is trying to get insight if there is a bursty nature in the time slot when system calls are done. Following DTrace snippet is tried: > > dtrace -q -p 3173 -n ''syscall::read:entry { @[probefunc] = count(); } profile:::tick-250ms { printa("%6s %@6d\n", @); printf("\n"); clear(@); }'' > > This to chop up a one second interval into four different equal sized 250 millsec slots. Counting the number of read and write syscalls that fall in each slot. > This one is run against a process that reads a single character and writes that character (stdin, stdout) continuously. Wrote it myself so I dare say it is functionally correct :-) > > The output I get is: > > read 15078 > read 0 > read 0 > read 0 > > And this pattern continues till hell freezes over: first slot has all the values, others are always zero. Tried this with 200msec and 100msec intervals too. No difference. > > Mmmm I expect more of a uniform distribution. Lets try to sample in DTrace a little different. In the end it might be an unforeseen side effect of the application :) > > The following DTrace script was used: > > BEGIN > { > baseline = timestamp; > } > > syscall::read:entry > /pid == $target/ > { > this->tmp = (timestamp - baseline) / (1000 * 1000 * 250); > @[this->tmp] = count(); > } > > profile:::tick-1000ms > { > baseline = timestamp; > printa("%3d %@u\n", @); > clear(@); > } > > END > { > } > > And this one does show a uniform distribution of the system calls as I expected: > 3 10601 > 2 10610 > 1 10940 > 0 11534 > > Question: why is the profile:::tick-xxxmsec behaving this way? What am I missing? > > Thanks > Paul > > > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Bryan Cantrill
2007-Feb-15 17:22 UTC
[dtrace-discuss] profile provider: is it me doing stupid things?
On Thu, Feb 15, 2007 at 04:49:45AM -0800, Paul van den Bogaard wrote:> Just showing someone how great DTrace is and then we spot something I do not understand. > Of course it can be a major misunderstanding of myself. Would appreciate another (expert) look upon this. > > The goal we try to achieve is trying to get insight if there is a bursty nature in the time slot when system calls are done. Following DTrace snippet is tried: > > dtrace -q -p 3173 -n ''syscall::read:entry { @[probefunc] = count(); } profile:::tick-250ms { printa("%6s %@6d\n", @); printf("\n"); clear(@); }'' > > This to chop up a one second interval into four different equal sized 250 millsec slots. Counting the number of read and write syscalls that fall in each slot. > This one is run against a process that reads a single character and writes that character (stdin, stdout) continuously. Wrote it myself so I dare say it is functionally correct :-) > > The output I get is: > > read 15078 > read 0 > read 0 > read 0 > > And this pattern continues till hell freezes over: first slot has all the values, others are always zero. Tried this with 200msec and 100msec intervals too. No difference. > > Mmmm I expect more of a uniform distribution. Lets try to sample in DTrace a little different. In the end it might be an unforeseen side effect of the application :)It''s actually a side-effect of both the switchrate and the aggrate. The switchrate dictates how frequently libdtrace(3LIB) reads the principal buffer from dtrace(7D); the aggrate dictates how frequently the aggregation buffers are read and processed. Because your printa() both (1) emits data to the principal buffer (to indicate that the printa() action should be taken at user-level and (2) implicitly expects the aggregation to have been updated at at least the same frequency, your script will only work as you expect as long as your profile probe is of a lower frequency than _both_ the switchrate _and_ the aggrate. And, as you might have guessed by now, both the switchrate and the aggrate default to 1hz. So: to get your script to behave as you (rightly) expected, add this to the command line: -x switchrate=250ms -x aggrate=250ms While both the aggrate and the switchrate _are_ documented, their interaction in a problem like this is quite subtle, and not explicitly called out anywhere in the documentation; apologies for that... - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc