Hello All, I am trying to write a D-script using the pid provider in the fashion that when it dumps out functionname %usr %sys %idle (In my script I track both pid:::entry,return as well as syscall:::entry,return) I am expecting following output which is spitted out every 10 seconds intervals or so Process: 192 functionname %time %usr %sys %idle ----- memcpy 5 80 0 20 read 5 0 80 20 write 10 0 60 40 dumbfunction 40 0 0 100 hogfunction 40 100 0 0 ------ Total 100 44 10 46 now using timestamp (or vtimestamp) I get find out time (and can normalize it to %time).. which I think is total time %u+%idle (similarly I can track syscalls times too which is %s+%idle) However the thing that''s bugging me is how to separate out the %user, %idle from %time. If I do that, I can track functions which are causing the CPU to be idle when it was expected to be doing useful work. Also if total time is high and idle component of it is low, then it indicates there is a potential of rewriting the function so it does not become a hog. Can this be achieved? I think it will be great script to root out both idle causing functions and functions thats a CPU hog. Thanks. Regards, Jignesh
Your goal is accomplished much easier with prstat(1). Using prstat with -Lm (per-LWP stats, report microstates) will tell you how much time each thread in a process is burning CPU, versus sleeping or waiting for a CPU (there are other microstates as well). Use -p <PID> to target a specific process. Once you understand, at a high level, the behaviour of the processes and threads off interest, use DTrace to answer all remaining questions, or other tools, depending on what the question is. For example, if prstat indicates a particular process (or thread) is burning a lot of CPU in %usr, things like system call statistics are not particularly interesting. Rather, use the dtrace profile provider to begin profiling the activity of the process, and understand where it''s spending it''s time; #dtrace -n ''profile-997 / arg1 != 0 && execname == "target_process / { @s[ustack(1)] = count() / Above is a very simple clock based profile that grabs the top of the user stack when the probe fires. Drill down further with the PID provider. If you''re burning time in the kernel: #dtrace -n ''profile-997 / arg0 != 0 / { @s[stack(1)]=count() }'' To get a quick kernel profile, or use lockstat(1), which is a dtrace consumer. (#lockstat -i 997 -Ik sleep <DURATION> ) Your goal is a little ambitious (if I''m understanding your query correctly). That is, you''d like to accomplish in a dtrace script what we typically accomplish using a couple iterative steps. The early steps using prstat(1) are so fast, and so easy, it''s not worth looking for a more complex way to do it. HTH, /jim Jignesh Shah wrote:>Hello All, > >I am trying to write a D-script using the pid provider in the fashion that when it dumps out > >functionname %usr %sys %idle > >(In my script I track both pid:::entry,return as well as syscall:::entry,return) > >I am expecting following output which is spitted out every 10 seconds intervals or so > >Process: 192 > >functionname %time %usr %sys %idle >----- >memcpy 5 80 0 20 >read 5 0 80 20 >write 10 0 60 40 >dumbfunction 40 0 0 100 >hogfunction 40 100 0 0 >------ >Total 100 44 10 46 > >now using timestamp (or vtimestamp) I get find out time (and can normalize it to %time).. which I think is total time %u+%idle (similarly I can track syscalls times too which is %s+%idle) > >However the thing that''s bugging me is how to separate out the %user, %idle from %time. > >If I do that, I can track functions which are causing the CPU to be idle when it was expected to be doing useful work. > >Also if total time is high and idle component of it is low, then it indicates there is a potential of rewriting the function so it does not become a hog. > >Can this be achieved? I think it will be great script to root out both idle causing functions and functions thats a CPU hog. > >Thanks. > >Regards, >Jignesh > > >_______________________________________________ >dtrace-discuss mailing list >dtrace-discuss at opensolaris.org > >
Hi Jim, Yes I like prstat -mLp. But its still at the thread level. I am thinking more in the lines of function level and its attributed timings. Even if you get the snapshot from prstat -mLp and by the time you run the dtrace script for that it might be too late to match it up against the "snapshot" from prstat -mL. (Of course I also lack the "Jim Mauro" brain to map it up correctly between the two results sets :-) ) Maybe we need an extended option with prstat itselt to get by function name information out there (which prstat can do a quick dtrace call). Infact to be overly ambitious is to figure out the IO distribution by function name along with the CPU timings to cover that aspect too (and the memory??). This will add real value to "OpenSolaris" as no other OS can give such information out of the box without learning D :-) Regards, Jignesh ----- Original Message ----- From: Jim Mauro <James.Mauro at Sun.COM> Date: Friday, August 26, 2005 1:24 pm Subject: Re: [dtrace-discuss] Pid Provider timings> > Your goal is accomplished much easier with prstat(1). > > Using prstat with -Lm (per-LWP stats, report microstates) will > tell you how much time each thread in a process is burning CPU, > versus sleeping or waiting for a CPU (there are other microstates > as well). > > Use -p <PID> to target a specific process. > > Once you understand, at a high level, the behaviour of the processes > and threads off interest, use DTrace to answer all remaining > questions,or other tools, depending on what the question is. > > For example, if prstat indicates a particular process (or thread) > is > burning a > lot of CPU in %usr, things like system call statistics are not > particularlyinteresting. Rather, use the dtrace profile provider to > begin profiling the > activity of the process, and understand where it''s spending it''s time; > > #dtrace -n ''profile-997 / arg1 != 0 && execname == "target_process > / { > @s[ustack(1)] = count() / > > Above is a very simple clock based profile that grabs the top of > the user > stack when the probe fires. Drill down further with the PID provider. > > If you''re burning time in the kernel: > > #dtrace -n ''profile-997 / arg0 != 0 / { @s[stack(1)]=count() }'' > > To get a quick kernel profile, or use lockstat(1), which is a > dtrace > consumer. > (#lockstat -i 997 -Ik sleep <DURATION> ) > > Your goal is a little ambitious (if I''m understanding your query > correctly).That is, you''d like to accomplish in a dtrace script > what we typically > accomplish using a couple iterative steps. The early steps using > prstat(1) are so fast, and so easy, it''s not worth looking for a > more > complex > way to do it. > > HTH, > /jim > > > Jignesh Shah wrote: > > >Hello All, > > > >I am trying to write a D-script using the pid provider in the > fashion that when it dumps out > > > >functionname %usr %sys %idle > > > >(In my script I track both pid:::entry,return as well as > syscall:::entry,return)> > >I am expecting following output which is spitted out every 10 > seconds intervals or so > > > >Process: 192 > > > >functionname %time %usr %sys %idle > >----- > >memcpy 5 80 0 20 > >read 5 0 80 20 > >write 10 0 60 40 > >dumbfunction 40 0 0 100 > >hogfunction 40 100 0 0 > >------ > >Total 100 44 10 46 > > > >now using timestamp (or vtimestamp) I get find out time (and can > normalize it to %time).. which I think is total time %u+%idle > (similarly I can track syscalls times too which is %s+%idle) > > > >However the thing that''s bugging me is how to separate out the > %user, %idle from %time. > > > >If I do that, I can track functions which are causing the CPU to > be idle when it was expected to be doing useful work. > > > >Also if total time is high and idle component of it is low, then > it indicates there is a potential of rewriting the function so it > does not become a hog. > > > >Can this be achieved? I think it will be great script to root out > both idle causing functions and functions thats a CPU hog. > > > >Thanks. > > > >Regards, > >Jignesh > > > > > >_______________________________________________ > >dtrace-discuss mailing list > >dtrace-discuss at opensolaris.org > > > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
An alternative for user-process profiling is the collector/analyzer from Studio. See: http://dpweb.sfbay/products/analyzer http://dpweb.sfbay/products/analyzer/venus.html To do the high-resolution profiling, specify "collect -p hi ..." Time in all microstates is collected, and you can easily drill down to annotated source and disassembly, march up and down the callstack, see a graphical display of profile events in a timeline, with the callstacks, and more. We also have a GUI on top of Dtrace for kernel profiling, invoked as "er_kernel -p hi ..." See: http://dpweb.sfbay/products/analyzer/kernel.html Marty Jim Mauro wrote:> Your goal is accomplished much easier with prstat(1). > > Using prstat with -Lm (per-LWP stats, report microstates) will > tell you how much time each thread in a process is burning CPU, > versus sleeping or waiting for a CPU (there are other microstates > as well). > > Use -p <PID> to target a specific process. > > Once you understand, at a high level, the behaviour of the processes > and threads off interest, use DTrace to answer all remaining questions, > or other tools, depending on what the question is. > > For example, if prstat indicates a particular process (or thread) is > burning a > lot of CPU in %usr, things like system call statistics are not particularly > interesting. Rather, use the dtrace profile provider to begin profiling the > activity of the process, and understand where it''s spending it''s time; > > #dtrace -n ''profile-997 / arg1 != 0 && execname == "target_process / { > @s[ustack(1)] = count() / > > Above is a very simple clock based profile that grabs the top of the user > stack when the probe fires. Drill down further with the PID provider. > > If you''re burning time in the kernel: > > #dtrace -n ''profile-997 / arg0 != 0 / { @s[stack(1)]=count() }'' > > To get a quick kernel profile, or use lockstat(1), which is a dtrace > consumer. > (#lockstat -i 997 -Ik sleep <DURATION> ) > > Your goal is a little ambitious (if I''m understanding your query correctly). > That is, you''d like to accomplish in a dtrace script what we typically > accomplish using a couple iterative steps. The early steps using > prstat(1) are so fast, and so easy, it''s not worth looking for a more > complex > way to do it. > > HTH, > /jim > > > Jignesh Shah wrote: > > >>Hello All, >> >>I am trying to write a D-script using the pid provider in the fashion that when it dumps out >> >>functionname %usr %sys %idle >> >>(In my script I track both pid:::entry,return as well as syscall:::entry,return) >> >>I am expecting following output which is spitted out every 10 seconds intervals or so >> >>Process: 192 >> >>functionname %time %usr %sys %idle >>----- >>memcpy 5 80 0 20 >>read 5 0 80 20 >>write 10 0 60 40 >>dumbfunction 40 0 0 100 >>hogfunction 40 100 0 0 >>------ >>Total 100 44 10 46 >> >>now using timestamp (or vtimestamp) I get find out time (and can normalize it to %time).. which I think is total time %u+%idle (similarly I can track syscalls times too which is %s+%idle) >> >>However the thing that''s bugging me is how to separate out the %user, %idle from %time. >> >>If I do that, I can track functions which are causing the CPU to be idle when it was expected to be doing useful work. >> >>Also if total time is high and idle component of it is low, then it indicates there is a potential of rewriting the function so it does not become a hog. >> >>Can this be achieved? I think it will be great script to root out both idle causing functions and functions thats a CPU hog. >> >>Thanks. >> >>Regards, >>Jignesh >> >> >>_______________________________________________ >>dtrace-discuss mailing list >>dtrace-discuss at opensolaris.org >> >> > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
On Fri, Aug 26, 2005 at 02:05:55PM -0400, Jignesh Shah wrote:> Hi Jim, > > Yes I like prstat -mLp. But its still at the thread level. I am thinking more in the lines of function level and its attributed timings. Even if you get the snapshot from prstat -mLp and by the time you run the dtrace script for that it might be too late to match it up against the "snapshot" from prstat -mL. (Of course I also lack the "Jim Mauro" brain to map it up correctly between the two results sets :-) ) > > Maybe we need an extended option with prstat itselt to get by function name information out there (which prstat can do a quick dtrace call).For whatever it''s worth, this will kind of analysis will become much easier once I have integrated the fix for 6311947: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6311947 The wad containing this fix is in the final stages of testing; it will integrate over the weekend or (more likely) very early next week. Look for it in an upcoming SX/OpenSolaris release, at any rate... (Unfortunately, the presentation of the bug database doesn''t differentiate comments; the suggestion of f_contains() and m_contains() in the referenced bug report is not mine but Roch Bourbonnais'' -- and while (still) a good RFE, it''s ultimately unrelated to 6311947.) - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
Hi Jignesh, I''m not sure if this is precisely the question you''re asking, but you can find out when a thread comes off the CPU (or goes to sleep or is woken up) using the sched provider. If you''re trying to understand the duration for which a function causes a thread to sleep, the sched provider will probably be helpful. Adam On Fri, Aug 26, 2005 at 12:42:20PM -0400, Jignesh Shah wrote:> Hello All, > > I am trying to write a D-script using the pid provider in the fashion that when it dumps out > > functionname %usr %sys %idle > > (In my script I track both pid:::entry,return as well as syscall:::entry,return) > > I am expecting following output which is spitted out every 10 seconds intervals or so > > Process: 192 > > functionname %time %usr %sys %idle > ----- > memcpy 5 80 0 20 > read 5 0 80 20 > write 10 0 60 40 > dumbfunction 40 0 0 100 > hogfunction 40 100 0 0 > ------ > Total 100 44 10 46 > > now using timestamp (or vtimestamp) I get find out time (and can normalize it to %time).. which I think is total time %u+%idle (similarly I can track syscalls times too which is %s+%idle) > > However the thing that''s bugging me is how to separate out the %user, %idle from %time. > > If I do that, I can track functions which are causing the CPU to be idle when it was expected to be doing useful work. > > Also if total time is high and idle component of it is low, then it indicates there is a potential of rewriting the function so it does not become a hog. > > Can this be achieved? I think it will be great script to root out both idle causing functions and functions thats a CPU hog. > > Thanks. > > Regards, > Jignesh > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
I have been thinking about the sched provider... and also spent playing with the script using sched provider... Here are some fundamental queries that I am not sure about (based on my limited knowledge of the internals) * When function boundaries are crossed.. can it be safely assumed that it is ON cpu ? (meaning all the ticks starting from the function call can be assumed as CPU time.. till the next Sched OFF-event)? * When a function is about to be returned.. Again can it be OFF CPU (meaning it has to be ON CPU to fire the return probe?) * Also a fundamentally stupid question: When a userland function calls a system call, does the self->ts variable in both probes is the same variable? (Relation of Thread Local Storage between Userland calls and System calls) This will help me to get started. Thanks. Regards, Jignesh ----- Original Message ----- From: Adam Leventhal <ahl at eng.sun.com> Date: Monday, August 29, 2005 11:24 am Subject: Re: [dtrace-discuss] Pid Provider timings> Hi Jignesh, > > I''m not sure if this is precisely the question you''re asking, but > you can > find out when a thread comes off the CPU (or goes to sleep or is > woken up) > using the sched provider. If you''re trying to understand the > duration for > which a function causes a thread to sleep, the sched provider will > probablybe helpful. > > Adam > > On Fri, Aug 26, 2005 at 12:42:20PM -0400, Jignesh Shah wrote: > > Hello All, > > > > I am trying to write a D-script using the pid provider in the > fashion that when it dumps out > > > > functionname %usr %sys %idle > > > > (In my script I track both pid:::entry,return as well as > syscall:::entry,return)> > > I am expecting following output which is spitted out every 10 > seconds intervals or so > > > > Process: 192 > > > > functionname %time %usr %sys %idle > > ----- > > memcpy 5 80 0 20 > > read 5 0 80 20 > > write 10 0 60 40 > > dumbfunction 40 0 0 100 > > hogfunction 40 100 0 0 > > ------ > > Total 100 44 10 46 > > > > now using timestamp (or vtimestamp) I get find out time (and can > normalize it to %time).. which I think is total time %u+%idle > (similarly I can track syscalls times too which is %s+%idle) > > > > However the thing that''s bugging me is how to separate out the > %user, %idle from %time. > > > > If I do that, I can track functions which are causing the CPU to > be idle when it was expected to be doing useful work. > > > > Also if total time is high and idle component of it is low, then > it indicates there is a potential of rewriting the function so it > does not become a hog. > > > > Can this be achieved? I think it will be great script to root out > both idle causing functions and functions thats a CPU hog. > > > > Thanks. > > > > Regards, > > Jignesh > > > > > > _______________________________________________ > > dtrace-discuss mailing list > > dtrace-discuss at opensolaris.org > > -- > Adam Leventhal, Solaris Kernel Development > http://blogs.sun.com/ahl
On Mon, Aug 29, 2005 at 11:34:16AM -0400, Jignesh Shah wrote:> I have been thinking about the sched provider... > > and also spent playing with the script using sched provider... > > Here are some fundamental queries that I am not sure about (based on my limited knowledge of the internals) > > * When function boundaries are crossed.. can it be safely assumed that > it is ON cpu ? (meaning all the ticks starting from the function call > can be assumed as CPU time.. till the next Sched OFF-event)?No; there are a couple types of kernel functions which are called between the off-cpu probe and the on-cpu probe: 1. softints (see http://cvs.opensolaris.org/source/search?refs=add_softintr) 2. the "save" and "restore" context callbacks (see http://cvs.opensolaris.org/source/search?refs=installctx) These generally don''t take much time, and the timing difference can be ignored. No userland functions will be called while the thread is off-cpu, of course.> * When a function is about to be returned.. Again can it be OFF CPU > (meaning it has to be ON CPU to fire the return probe?)Any of the above functions, if called, will have to return before the thread can proceed to the on-cpu probe, so yes.> * Also a fundamentally stupid question: When a userland function calls > a system call, does the self->ts variable in both probes is the same > variable? (Relation of Thread Local Storage between Userland calls > and System calls)Yes; thread local variables are associated with the kernel thread, which is the same whether or not you are executing in userland. Hope that helps, - jonathan -- Jonathan Adams, Solaris Kernel Development
On Mon, Aug 29, 2005 at 11:34:16AM -0400, Jignesh Shah wrote:> * When function boundaries are crossed.. can it be safely assumed > that it is ON cpu ? (meaning all the ticks starting from the function > call can be assumed as CPU time.. till the next Sched OFF-event)? > > * When a function is about to be returned.. Again can it be OFF CPU > (meaning it has to be ON CPU to fire the return probe?)Any time a probe fires the executing thread is -- by definition -- on CPU. There are a couple of kernel functions that may fire on the way to a thread coming on and off the CPU.> * Also a fundamentally stupid question: When a userland function > calls a system call, does the self->ts variable in both probes is the > same variable? (Relation of Thread Local Storage between Userland > calls and System calls)Thread-local variables persist between user-land and the kernel. The only caveat is that for profile provider probes should be thought of as firing in their own thread -- thread-local variables set in the thread that''s interrupted by the profile probe aren''t visible from the context of the profile probe. For example, this script will generate no output: ---8<--- tls.d ---8<--- #!/usr/sbin/dtrace -s sched:::on-cpu { self->yes = 1; } profile:::profile-4999 /self->yes/ { } ---8<--- tls.d ---8<--- Hope that helps. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl