thr3ads.net - dtrace discuss - [dtrace-discuss] dapptrace times vs. wal clock time [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Joshua Clark

2006-Mar-01 22:18 UTC

[dtrace-discuss] dapptrace times vs. wal clock time

Hi everyone. We''re trying out dapptrace and dappprof in an effort to
isolate the cause of some slow-downs in an application. I''m having a
hard time interpreting the elapsed and CPU times reported by dapptrace. Are
these times in any way related to the actual wall clock elapsed time? For
example: running "dapptrace -eoF banner hello". timex reports 2.74
"real" time, meaning wall clock elapsed time. dapptrace reports the
total elapsed time as 29715us, or just over 29ms.

Where is the difference between the actual wall clock elapsed time and the
elapsed time reported by dapptrace?

Any help you could provide would be appreciated!

Thanks,
    -Josh-
 
More information on what we''re trying to accomplish: We have a program
that runs on both Soalris 8 and Solaris 10.  A particular area of the program
takes 2-3 times longer to run on Solaris 10 than on 8. truss seems to indicate a
particular write is causing the delay (~2sec) but using iosnoop I''m
able to determine that the actual write occuring in <2ms. Something in
between system calls is causing this delay. I''m trying to use dapptrace
to isolate this delay.



root at ryesnap6 # timex  dapptrace -eoF banner hello
 ELAPSD    CPU CALL(args)                = return

 #    #  ######  #       #        ####
 #    #  #       #       #       #    #
 ######  #####   #       #       #    #
 #    #  #       #       #       #    #
 #    #  #       #       #       #    #
 #    #  ######  ######  ######   ####

      .      .   -> _start(0x0, 0x0, 0x0)
     87      9   <- _start = 176
      .      .   -> main(0x2, 0xFFBFFC7C, 0xFFBFFC88)
      .      .     -> banner(0xFFBFFD47, 0x2251C, 0x5CA8C)
      .      .       -> banset(0x20, 0x2251C, 0x22000)
     68     17       <- banset = 68
      .      .       -> convert(0x68, 0x2276F, 0x2276E)
     62      8       <- convert = 440
      .      .       -> banfil(0x2245C, 0x2251C, 0xFFFFFFFFFFFFFF76)
     60      9       <- banfil = 100
      .      .       -> convert(0x65, 0x2276F, 0x22722)
     59      6       <- convert = 440
      .      .       -> banfil(0x22447, 0x22524, 0xFFFFFFFFFFFFFF73)
     56      6       <- banfil = 100
      .      .       -> convert(0x6C, 0x22777, 0x2272A)
     59      6       <- convert = 440
      .      .       -> banfil(0x22478, 0x2252C, 0xFFFFFFFFFFFFFF7A)
     56      6       <- banfil = 100
      .      .       -> convert(0x6C, 0x2277F, 0x22732)
     59      6       <- convert = 440
      .      .       -> banfil(0x22478, 0x22534, 0xFFFFFFFFFFFFFF7A)
     56      6       <- banfil = 100
      .      .       -> convert(0x6F, 0x22787, 0x2273A)
     59      6       <- convert = 440
      .      .       -> banfil(0x2248D, 0x2253C, 0xFFFFFFFFFFFFFF7D)
     56      6       <- banfil = 100
   1472    139     <- banner = 140
      .      .     -> banprt(0x2251C, 0x378, 0xFFBFFD47)
    484    427     <- banprt = 128
  29715   1914   <- main = 132


real        2.74
user        1.50
sys         1.21
This message posted from opensolaris.org

Brendan Gregg

2006-Mar-02 00:32 UTC

head link

[dtrace-discuss] dapptrace times vs. wal clock time

G''Day Joshua,

On Wed, 1 Mar 2006, Joshua Clark wrote:
> Hi everyone. We''re trying out dapptrace and dappprof in an effort
to
> isolate the cause of some slow-downs in an application. I''m having
a
> hard time interpreting the elapsed and CPU times reported by dapptrace.
> Are these times in any way related to the actual wall clock elapsed
> time? For example: running "dapptrace -eoF banner hello". timex
reports
> 2.74 "real" time, meaning wall clock elapsed time. dapptrace
reports the
> total elapsed time as 29715us, or just over 29ms.
>
> Where is the difference between the actual wall clock elapsed time and
> the elapsed time reported by dapptrace?
Both times are elapsed times, taking measurements at the start and finish
and printing the delta.

The "real" (elapsed) value from timex of 2.74 seconds is accurate.

The elapsed time from dapptrace of 29ms is also accurate, bearing in mind
that a heavyweight script like dapptrace will change observed times.

The issue is that you are running timex on the dtrace script itself, and
so measuring overheads for /usr/sbin/dtrace to execute - resulting in
2.74 seconds elapsed time. timex isn''t measuring banner directly.

There are some points to bear in mind when reading dapptrace/dtruss times,
which I''ve been documenting in Docs/Notes/dtruss_notes.txt in the
toolkit.
These scripts are some of the most heavyweight, so they will change
(increase) times somewhat due to overheads. However, this change should be
proportional - so they should still be useful to identify a hot spot.

The following oneliner may be less invasive, and may also identify the
hot spot (its stack back trace),

# dtrace -n ''profile-1000 /pid == 17873/ { @[ustack()] = count();
}''
                                  ^^^^^
                                  change for your PID
...

If you ever want a excellent understanding of what the toolkit scripts do
- read them! :) I try to write in a neat commented style (and often
deliberately avoid using shortcuts) so that they can be read by others.

cheers,

Brendan

[Sydney, Australia]

Joshua Clark

2006-Mar-02 23:17 UTC

head link

[dtrace-discuss] Re: dapptrace times vs. wal clock time

Brendan,

Thanks. I appreciate you taking the time to respond. I''m just beginning
to test Solaris 10 and in the process I''m trying to learn Dtrace.

Thanks again,

-Josh-
This message posted from opensolaris.org

Daniel J Pennell

2006-Apr-03 19:07 UTC

head link

[dtrace-discuss] Re: dapptrace times vs. wal clock time

Brendan,
I an having the same sort of problem with my application on Solaris 10 using
Studio 11.  If I run my application using an old executable that was compiled on
Solaris 8 using Forte6 it is one third faster on Solaris 10 than if I use a
Studio 11 Solaris 10 build executable running on Solaris 10.  2 minutes 2
seconds verses 3 minutes and 15 seconds.  I ran Analyzer on both executables and
the Solaris 10 ran .01 faster in CPU time.  So what was it doing for the other
minute.  Same code, same compile options, and same machine.
Daniel
 
 
This message posted from opensolaris.org

Brendan Gregg

2006-Apr-05 15:57 UTC

head link

[dtrace-discuss] Re: dapptrace times vs. wal clock time

G''Day Daniel,

On Mon, 3 Apr 2006, Daniel J Pennell wrote:
> Brendan,
> I an having the same sort of problem with my application on Solaris 10
> using Studio 11.  If I run my application using an old executable that
> was compiled on Solaris 8 using Forte6 it is one third faster on Solaris
> 10 than if I use a Studio 11 Solaris 10 build executable running on
> Solaris 10.  2 minutes 2 seconds verses 3 minutes and 15 seconds.
Ok, so at this point there is no dtrace whatsoever - these times are from
a measurement such as ptime running the binary with an identical workload?
> I ran
> Analyzer on both executables and the Solaris 10 ran .01 faster in CPU
> time.  So what was it doing for the other minute.  Same code, same
> compile options, and same machine.
> Daniel
You''ll need to carefully study the system response to each test. What
system resource activity have you already checked?

A simple way to begin studying the user-land activity from dtrace would
be,

   # dtrace -n ''profile-1000 /pid == 12345/ { @[ustack()] = count();
}''


Also, different builds may be linked to different libraries. Is this a
multi-threaded application? It may be useful to check what they are linked
to.

Brendan

Daniel J Pennell

2006-Apr-06 20:04 UTC

head link

[dtrace-discuss] Re: Re: dapptrace times vs. wal clock time

Brendan,
To get the clock time for my application I run it from a script and in the
script I print ''date'' to get the time of start and finish.  I
ran the dtrace command that you suggested and I did not get anything that I
could understand. It looked like this:
[i]0x9aae0
0x9a608
  1
...[/i]More of the same for 67000 lines.  Also I had the following error.
[b]dtrace:  error on enabled probe ID 1 (ID49447:  profile ::: profile -1000): 
invalid address (0fxffbfe000) in action #2[/b]

Does that mean anything?
My application is not multi-threaded.  How would I check the  low level system
libraries that are linked in?  I use the same make file for both versions,
forte6 and Studio  11.

Daniel
 
 
This message posted from opensolaris.org

Brendan Gregg

2006-Apr-07 09:23 UTC

head link

[dtrace-discuss] Re: Re: dapptrace times vs. wal clock time

G''Day Daniel,

On Thu, 6 Apr 2006, Daniel J Pennell wrote:
> Brendan,
> To get the clock time for my application I run it from a script and in
> the script I print ''date'' to get the time of start and
finish.
That doesn''t sound very precise, maybe you should be using ptime. I
imagine you are taking steps to ensure the workload and system are in
identical states? (remounting file systems, for example).
> I ran
> the dtrace command that you suggested and I did not get anything that I
> could understand. It looked like this:
> [i]0x9aae0
> 0x9a608
>   1
Try hitting Ctrl-C earlier - it may give dtrace a better chance of
evaluating those stack traces.
> ...[/i]More of the same for 67000 lines.
Hrm, one thing that can cause this is many short lived processes. Try
execsnoop from the DTraceToolkit, or,

	dtrace -n ''proc:::exec-success { trace(curpsinfo->pr_psargs);
}''

Most applications will produce less than a dozen lines of output.
>  Also I had the following error.
> [b]dtrace:  error on enabled probe ID 1 (ID49447:  profile ::: profile
> -1000):  invalid address (0fxffbfe000) in action #2[/b]
>
> Does that mean anything?
nothing to worry about - sampling jitter.
> My application is not multi-threaded.  How would I check the  low
> level system libraries that are linked in?  I use the same make file
> for both versions, forte6 and Studio  11.
	# ldd executable-name

no worries,

Brendan

Daniel J Pennell

2006-Apr-24 16:43 UTC

head link

[dtrace-discuss] Re: Re: Re: dapptrace times vs. wal clock time

Brendan,
I loaded a patch cluster disk Feb 24 2006 and recompiled everything.  My run
times were better.  I went ahead and loaded Solaris 10 on the rest of may
machines.  I do not know what was the problem.  But I am satisfied for now.  I
still would like to use Dtrace to see how I could speed up my program.  I do not
under stand the output that the commands that you suggested mean.  I plan to
take a Dtrace class.  Maybe that will help.
Thanks for the help!
Daniel
 
 
This message posted from opensolaris.org

dtrace discuss - Mar 2006 - dapptrace times vs. wal clock time

[dtrace-discuss] dapptrace times vs. wal clock time

[dtrace-discuss] dapptrace times vs. wal clock time

[dtrace-discuss] Re: dapptrace times vs. wal clock time

[dtrace-discuss] Re: dapptrace times vs. wal clock time

[dtrace-discuss] Re: dapptrace times vs. wal clock time

[dtrace-discuss] Re: Re: dapptrace times vs. wal clock time

[dtrace-discuss] Re: Re: dapptrace times vs. wal clock time

[dtrace-discuss] Re: Re: Re: dapptrace times vs. wal clock time