Hi everyone. We''re trying out dapptrace and dappprof in an effort to isolate the cause of some slow-downs in an application. I''m having a hard time interpreting the elapsed and CPU times reported by dapptrace. Are these times in any way related to the actual wall clock elapsed time? For example: running "dapptrace -eoF banner hello". timex reports 2.74 "real" time, meaning wall clock elapsed time. dapptrace reports the total elapsed time as 29715us, or just over 29ms. Where is the difference between the actual wall clock elapsed time and the elapsed time reported by dapptrace? Any help you could provide would be appreciated! Thanks, -Josh- More information on what we''re trying to accomplish: We have a program that runs on both Soalris 8 and Solaris 10. A particular area of the program takes 2-3 times longer to run on Solaris 10 than on 8. truss seems to indicate a particular write is causing the delay (~2sec) but using iosnoop I''m able to determine that the actual write occuring in <2ms. Something in between system calls is causing this delay. I''m trying to use dapptrace to isolate this delay. root at ryesnap6 # timex dapptrace -eoF banner hello ELAPSD CPU CALL(args) = return # # ###### # # #### # # # # # # # ###### ##### # # # # # # # # # # # # # # # # # # # # ###### ###### ###### #### . . -> _start(0x0, 0x0, 0x0) 87 9 <- _start = 176 . . -> main(0x2, 0xFFBFFC7C, 0xFFBFFC88) . . -> banner(0xFFBFFD47, 0x2251C, 0x5CA8C) . . -> banset(0x20, 0x2251C, 0x22000) 68 17 <- banset = 68 . . -> convert(0x68, 0x2276F, 0x2276E) 62 8 <- convert = 440 . . -> banfil(0x2245C, 0x2251C, 0xFFFFFFFFFFFFFF76) 60 9 <- banfil = 100 . . -> convert(0x65, 0x2276F, 0x22722) 59 6 <- convert = 440 . . -> banfil(0x22447, 0x22524, 0xFFFFFFFFFFFFFF73) 56 6 <- banfil = 100 . . -> convert(0x6C, 0x22777, 0x2272A) 59 6 <- convert = 440 . . -> banfil(0x22478, 0x2252C, 0xFFFFFFFFFFFFFF7A) 56 6 <- banfil = 100 . . -> convert(0x6C, 0x2277F, 0x22732) 59 6 <- convert = 440 . . -> banfil(0x22478, 0x22534, 0xFFFFFFFFFFFFFF7A) 56 6 <- banfil = 100 . . -> convert(0x6F, 0x22787, 0x2273A) 59 6 <- convert = 440 . . -> banfil(0x2248D, 0x2253C, 0xFFFFFFFFFFFFFF7D) 56 6 <- banfil = 100 1472 139 <- banner = 140 . . -> banprt(0x2251C, 0x378, 0xFFBFFD47) 484 427 <- banprt = 128 29715 1914 <- main = 132 real 2.74 user 1.50 sys 1.21 This message posted from opensolaris.org
G''Day Joshua, On Wed, 1 Mar 2006, Joshua Clark wrote:> Hi everyone. We''re trying out dapptrace and dappprof in an effort to > isolate the cause of some slow-downs in an application. I''m having a > hard time interpreting the elapsed and CPU times reported by dapptrace. > Are these times in any way related to the actual wall clock elapsed > time? For example: running "dapptrace -eoF banner hello". timex reports > 2.74 "real" time, meaning wall clock elapsed time. dapptrace reports the > total elapsed time as 29715us, or just over 29ms. > > Where is the difference between the actual wall clock elapsed time and > the elapsed time reported by dapptrace?Both times are elapsed times, taking measurements at the start and finish and printing the delta. The "real" (elapsed) value from timex of 2.74 seconds is accurate. The elapsed time from dapptrace of 29ms is also accurate, bearing in mind that a heavyweight script like dapptrace will change observed times. The issue is that you are running timex on the dtrace script itself, and so measuring overheads for /usr/sbin/dtrace to execute - resulting in 2.74 seconds elapsed time. timex isn''t measuring banner directly. There are some points to bear in mind when reading dapptrace/dtruss times, which I''ve been documenting in Docs/Notes/dtruss_notes.txt in the toolkit. These scripts are some of the most heavyweight, so they will change (increase) times somewhat due to overheads. However, this change should be proportional - so they should still be useful to identify a hot spot. The following oneliner may be less invasive, and may also identify the hot spot (its stack back trace), # dtrace -n ''profile-1000 /pid == 17873/ { @[ustack()] = count(); }'' ^^^^^ change for your PID ... If you ever want a excellent understanding of what the toolkit scripts do - read them! :) I try to write in a neat commented style (and often deliberately avoid using shortcuts) so that they can be read by others. cheers, Brendan [Sydney, Australia]
Joshua Clark
2006-Mar-02 23:17 UTC
[dtrace-discuss] Re: dapptrace times vs. wal clock time
Brendan, Thanks. I appreciate you taking the time to respond. I''m just beginning to test Solaris 10 and in the process I''m trying to learn Dtrace. Thanks again, -Josh- This message posted from opensolaris.org
Daniel J Pennell
2006-Apr-03 19:07 UTC
[dtrace-discuss] Re: dapptrace times vs. wal clock time
Brendan, I an having the same sort of problem with my application on Solaris 10 using Studio 11. If I run my application using an old executable that was compiled on Solaris 8 using Forte6 it is one third faster on Solaris 10 than if I use a Studio 11 Solaris 10 build executable running on Solaris 10. 2 minutes 2 seconds verses 3 minutes and 15 seconds. I ran Analyzer on both executables and the Solaris 10 ran .01 faster in CPU time. So what was it doing for the other minute. Same code, same compile options, and same machine. Daniel This message posted from opensolaris.org
Brendan Gregg
2006-Apr-05 15:57 UTC
[dtrace-discuss] Re: dapptrace times vs. wal clock time
G''Day Daniel, On Mon, 3 Apr 2006, Daniel J Pennell wrote:> Brendan, > I an having the same sort of problem with my application on Solaris 10 > using Studio 11. If I run my application using an old executable that > was compiled on Solaris 8 using Forte6 it is one third faster on Solaris > 10 than if I use a Studio 11 Solaris 10 build executable running on > Solaris 10. 2 minutes 2 seconds verses 3 minutes and 15 seconds.Ok, so at this point there is no dtrace whatsoever - these times are from a measurement such as ptime running the binary with an identical workload?> I ran > Analyzer on both executables and the Solaris 10 ran .01 faster in CPU > time. So what was it doing for the other minute. Same code, same > compile options, and same machine. > DanielYou''ll need to carefully study the system response to each test. What system resource activity have you already checked? A simple way to begin studying the user-land activity from dtrace would be, # dtrace -n ''profile-1000 /pid == 12345/ { @[ustack()] = count(); }'' Also, different builds may be linked to different libraries. Is this a multi-threaded application? It may be useful to check what they are linked to. Brendan
Daniel J Pennell
2006-Apr-06 20:04 UTC
[dtrace-discuss] Re: Re: dapptrace times vs. wal clock time
Brendan, To get the clock time for my application I run it from a script and in the script I print ''date'' to get the time of start and finish. I ran the dtrace command that you suggested and I did not get anything that I could understand. It looked like this: [i]0x9aae0 0x9a608 1 ...[/i]More of the same for 67000 lines. Also I had the following error. [b]dtrace: error on enabled probe ID 1 (ID49447: profile ::: profile -1000): invalid address (0fxffbfe000) in action #2[/b] Does that mean anything? My application is not multi-threaded. How would I check the low level system libraries that are linked in? I use the same make file for both versions, forte6 and Studio 11. Daniel This message posted from opensolaris.org
Brendan Gregg
2006-Apr-07 09:23 UTC
[dtrace-discuss] Re: Re: dapptrace times vs. wal clock time
G''Day Daniel, On Thu, 6 Apr 2006, Daniel J Pennell wrote:> Brendan, > To get the clock time for my application I run it from a script and in > the script I print ''date'' to get the time of start and finish.That doesn''t sound very precise, maybe you should be using ptime. I imagine you are taking steps to ensure the workload and system are in identical states? (remounting file systems, for example).> I ran > the dtrace command that you suggested and I did not get anything that I > could understand. It looked like this: > [i]0x9aae0 > 0x9a608 > 1Try hitting Ctrl-C earlier - it may give dtrace a better chance of evaluating those stack traces.> ...[/i]More of the same for 67000 lines.Hrm, one thing that can cause this is many short lived processes. Try execsnoop from the DTraceToolkit, or, dtrace -n ''proc:::exec-success { trace(curpsinfo->pr_psargs); }'' Most applications will produce less than a dozen lines of output.> Also I had the following error. > [b]dtrace: error on enabled probe ID 1 (ID49447: profile ::: profile > -1000): invalid address (0fxffbfe000) in action #2[/b] > > Does that mean anything?nothing to worry about - sampling jitter.> My application is not multi-threaded. How would I check the low > level system libraries that are linked in? I use the same make file > for both versions, forte6 and Studio 11.# ldd executable-name no worries, Brendan
Daniel J Pennell
2006-Apr-24 16:43 UTC
[dtrace-discuss] Re: Re: Re: dapptrace times vs. wal clock time
Brendan, I loaded a patch cluster disk Feb 24 2006 and recompiled everything. My run times were better. I went ahead and loaded Solaris 10 on the rest of may machines. I do not know what was the problem. But I am satisfied for now. I still would like to use Dtrace to see how I could speed up my program. I do not under stand the output that the commands that you suggested mean. I plan to take a Dtrace class. Maybe that will help. Thanks for the help! Daniel This message posted from opensolaris.org