Hello, I''m using the below dtrace script to capture the flow and times of kernel functions, from an ioctl() call. After an entry into restorectx(), the trace of startimestamp - vtimestamp makes a dramatic jump downwards. Here''s the snippet showing the jump... 6 -> di_checkmem 8 220176600 6 <- di_checkmem 7 220176700 100 6 -> di_mem_addr 8 220176900 6 <- di_mem_addr 7 220177100 200 6 -> getmajor 8 220177400 6 -> getmajor 8 220177400 4 -> restorectx 9 94600 4 -> schedctl_restore 10 94900 4 <- schedctl_restore 9 95300 400 4 <- restorectx 8 95400 800 4 <- resume 7 95600 6500 4 <- cv_wait 6 95900 19600 4 <- ndi_devi_enter 5 97200 81726329100 I''m assuming since it is now running on a new proc, that it''s related to a context switch, and perhaps the per-CPU dtrace buffers. Is there any way around this? Jim ------------------------------- #!/usr/sbin/dtrace -s #pragma D option flowindent #pragma D option bufsize=4m syscall::ioctl:entry /arg1 == 0xDF07/ { self->level=0; self->start = vtimestamp; } :::entry /self->start/ { self->level++; ts[probefunc]=vtimestamp; trace(self->level); trace(vtimestamp - self->start); } :::return /self->start/ { self->level--; trace(self->level); trace(vtimestamp - self->start); trace(vtimestamp - ts[probefunc]); } syscall::ioctl:return /self->start/ { self->start = 0; exit(0); }
G''Day Jim, On Wed, 1 Mar 2006, Jim Nissen wrote:> Hello, > I''m using the below dtrace script to capture the flow and times of > kernel functions, from an ioctl() call. After an entry into > restorectx(), the trace of startimestamp - vtimestamp makes a dramatic > jump downwards. Here''s the snippet showing the jump... > > 6 -> di_checkmem 8 220176600 > 6 <- di_checkmem 7 > 220176700 100 > 6 -> di_mem_addr 8 220176900 > 6 <- di_mem_addr 7 > 220177100 200 > 6 -> getmajor 8 220177400 > 6 -> getmajor 8 220177400 > 4 -> restorectx 9 94600^^^ Oops, we skipped CPU right there - so this output is shuffled. The DTrace kernel framework maintains a buffer per CPU, which is fetched by dtrace(1M) once per second by default. dtrace(1M) steps through each CPU, dumping the buffer one by one, but doesn''t merge/post sort the data. You could print timestamp as a column and post sort. Flow indent may get wrecked a little. [...]> ------------------------------- > #!/usr/sbin/dtrace -s > #pragma D option flowindent > #pragma D option bufsize=4m > > syscall::ioctl:entry > /arg1 == 0xDF07/ > { > self->level=0; > self->start = vtimestamp; > } > > :::entry > /self->start/ > { > self->level++; > ts[probefunc]=vtimestamp; > trace(self->level); > trace(vtimestamp - self->start); > } > > :::return > /self->start/ > { > self->level--;^^^^^^^^^^^^^^ This is too soon, it should be done at the end of this block. This depth technique also assumes every entry has a return - such that they are symmetric. This is mostly the case, but not always (see previous posts about tail call optimization).> trace(self->level); > trace(vtimestamp - self->start); > trace(vtimestamp - ts[probefunc]);^^^^^^^^^^^^^ This assumes no recursive calls, again, this may not be the case. I came up with a solution to both problems for the dapptrace/dappprof tools, however to do so involved extra DTrace overhead that skews the timestamps somewhat (which I notice is what the very next post to dtrace-disscuss is about!). What you are doing is fine so long as you bear those assumptions in mind. no worries, Brendan [Sydney, Australia]
Hey Brendan,> G''Day Jim, > > On Wed, 1 Mar 2006, Jim Nissen wrote: > > >> Hello, >> I''m using the below dtrace script to capture the flow and times of >> kernel functions, from an ioctl() call. After an entry into >> restorectx(), the trace of startimestamp - vtimestamp makes a dramatic >> jump downwards. Here''s the snippet showing the jump... >> >> 6 -> di_checkmem 8 220176600 >> 6 <- di_checkmem 7 >> 220176700 100 >> 6 -> di_mem_addr 8 220176900 >> 6 <- di_mem_addr 7 >> 220177100 200 >> 6 -> getmajor 8 220177400 >> 6 -> getmajor 8 220177400 >> 4 -> restorectx 9 94600 >> > ^^^ > > Oops, we skipped CPU right there - so this output is shuffled. > > The DTrace kernel framework maintains a buffer per CPU, which is fetched > by dtrace(1M) once per second by default. dtrace(1M) steps through each > CPU, dumping the buffer one by one, but doesn''t merge/post sort the data. > > You could print timestamp as a column and post sort. Flow indent may get > wrecked a little. > > [...] >Yes, since my plans were to post process (Perl script actually found the delta) anyway, I''ll try again with just printing timestamps. My other thought is to try running command in processor set, or RT class, to ensure it gets more time on one CPU.>> ------------------------------- >> #!/usr/sbin/dtrace -s >> #pragma D option flowindent >> #pragma D option bufsize=4m >> >> syscall::ioctl:entry >> /arg1 == 0xDF07/ >> { >> self->level=0; >> self->start = vtimestamp; >> } >> >> :::entry >> /self->start/ >> { >> self->level++; >> ts[probefunc]=vtimestamp; >> trace(self->level); >> trace(vtimestamp - self->start); >> } >> >> :::return >> /self->start/ >> { >> self->level--; >> > ^^^^^^^^^^^^^^ > This is too soon, it should be done at the end of this block. > > This depth technique also assumes every entry has a return - such that > they are symmetric. This is mostly the case, but not always (see previous > posts about tail call optimization). > > >> trace(self->level); >> trace(vtimestamp - self->start); >> trace(vtimestamp - ts[probefunc]); >> > ^^^^^^^^^^^^^ > This assumes no recursive calls, again, this may not be the case. > > I came up with a solution to both problems for the dapptrace/dappprof > tools, however to do so involved extra DTrace overhead that skews the > timestamps somewhat (which I notice is what the very next post to > dtrace-disscuss is about!). What you are doing is fine so long as you > bear those assumptions in mind. >Yes, also discovered the same thing, from perl script post output. There are recursive calls, so the time delta is skewed. The other odd thing I''m hitting is what appears to be tail-calls, but they are not. In other words, some functions actually return, sometimes. Other times, I do not see returns. Again, it appears that the preemption of the thread off of one CPU, and on to another is effecting this. In the end, I''m trying to figure out the function flow, how many times each function is called, and how long (each time and total) they are running. We have an N+1 cluster node (the +1 node), which is attached to all the storage devices, and are hitting a problem where vmstat is taking 30 seconds to return from each <10 second interval. We''ve found, from truss, that libdevinfo and all the storage devices is the root cause, but want to dig a bit deeper. Thanks for your help! Jim> no worries, > > Brendan > > [Sydney, Australia] > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20060302/2e1f0e49/attachment.html>
On Thu, Mar 02, 2006 at 11:10:42AM +1100, Brendan Gregg wrote:> This is too soon, it should be done at the end of this block. > > This depth technique also assumes every entry has a return - such that > they are symmetric. This is mostly the case, but not always (see previous > posts about tail call optimization).Well, every entry should have a return; if not, that''s a bug in the instrumentation code. The real issue is that the "return" from a function may actually be a tailcall to another function, so the "time spent" is inaccurate. Cheers, - jonathan -- Jonathan Adams, Solaris Kernel Development