thr3ads.net - dtrace discuss - [dtrace-discuss] Accuracy of timestamp()!! [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Neelam

2007-Nov-07 00:37 UTC

[dtrace-discuss] Accuracy of timestamp()!!

I am using timestamp() in DTrace for timing some events, in particular duration
of a lock (i.e. time elapsed between the acquisition and release of a mutex
lock). I am getting pretty high numbers. But when I time the lock by explicitly 
using timing functions such as gethrtime(), I get pretty low numbers. Does this
mean timestamp() doesn''t give us the accurate measurements? What are
the alternatives?

Thanks,
Neelam


--
This message posted from opensolaris.org

Roman Shaposhnik

2007-Nov-07 00:55 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Tue, 2007-11-06 at 16:37 -0800, Neelam wrote:> I am using timestamp() in DTrace for timing some events, in particular
duration of a lock (i.e. time elapsed between the acquisition and release of a
mutex lock). I am getting pretty high numbers. But when I time the lock by
explicitly  using timing functions such as gethrtime(), I get pretty low
numbers. Does this mean timestamp() doesn''t give us the accurate
measurements? What are the alternatives?
  We had similar issues with gethrtime on 64bit architectures. Where
the result was truncated to 32 bits. Of course, it is hard
to tell whether it is the same problem or not without seeing
your D script.

Thanks,
Roman.

Rayson Ho

2007-Nov-07 01:14 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

gethrtime() is implemented as a "Fast Trap System Call", but
timestamp() in DTrace needs to go deeper into the kernel...

I only have the 1st edition of the "Solaris Internals" book, but the
conparsion (on page 47) of a fast trap system call vs a getpid() on a
300 MHz Ultra2:

                  latency (ns)
getpid()           2138
gethrtime()         320
lib C call          93

Since the overhead of gethrtime() is relative cheap compare to a
normal system call, the time difference may come from the difference
in overhead between the two...

Rayson




On Nov 6, 2007 7:37 PM, Neelam <neelam.goyal at gmail.com>
wrote:> I am using timestamp() in DTrace for timing some events, in particular
duration of a lock (i.e. time elapsed between the acquisition and release of a
mutex lock). I am getting pretty high numbers. But when I time the lock by
explicitly  using timing functions such as gethrtime(), I get pretty low
numbers. Does this mean timestamp() doesn''t give us the accurate
measurements? What are the alternatives?

michael schuster

2007-Nov-07 02:34 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

Rayson Ho wrote:> gethrtime() is implemented as a "Fast Trap System Call", but
> timestamp() in DTrace needs to go deeper into the kernel...
> 
> I only have the 1st edition of the "Solaris Internals" book, but
the
> conparsion (on page 47) of a fast trap system call vs a getpid() on a
> 300 MHz Ultra2:
> 
>                   latency (ns)
> getpid()           2138
> gethrtime()         320
> lib C call          93
> 
> Since the overhead of gethrtime() is relative cheap compare to a
> normal system call, the time difference may come from the difference
> in overhead between the two...
I''m fairly sure that that is not the case - all the measurement DTrace 
does is actually done inside the kernel, so the above will not apply.

I also believe that there''s be confision here, as there''s a
kernel *and*
a userland version of gethrtime(), and from context, I believe Rayson 
and Neelam are talking about different ones :-)

I''m inclined to believe that, compared to gethrtime(9F), DTrace 
instrumentation can be quite "heavy" (again, depending on your
specific
probe), and may in fact influence the results.

Please show us the exact script, so we can see whether this assumption 
helps.

Michael
> 
> Rayson
> 
> 
> 
> 
> On Nov 6, 2007 7:37 PM, Neelam <neelam.goyal at gmail.com> wrote:
>> I am using timestamp() in DTrace for timing some events, in particular
duration of a lock (i.e. time elapsed between the acquisition and release of a
mutex lock). I am getting pretty high numbers. But when I time the lock by
explicitly  using timing functions such as gethrtime(), I get pretty low
numbers. Does this mean timestamp() doesn''t give us the accurate
measurements? What are the alternatives?
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

-- 
Michael Schuster	Sun Microsystems, Inc.
recursion, n: see ''recursion''

Neelam

2007-Nov-07 20:35 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

Well I am trying to record the duration of user level mutex locks. Below is the
simple script:

-----------------------------------------------------------------
#!/usr/sbin/dtrace -s

dtrace:::BEGIN
{
self->begin=timestamp;
self->mutex_acq=0;
}

plockstat$1:::mutex-acquire
{
self->ts[arg0]=timestamp;
self->mutex_acq++;
@mutex_acquired["No of mutexes acquired"]=count();
}

plockstat$1:::mutex-release
/self->mutex_acq==1/
{
self->tstop[arg0]=timestamp;
@mutex_released["No of mutexes released"]=count();
printf("%d outer mutex duration:%d\n", arg0,
self->tstop[arg0]-self->ts[arg0]);
@["all_mutex_duration"]=avg(self->tstop[arg0]-self->ts[arg0]);
self->mutex_acq=0;
self->ts[arg0]=0;
}

plockstat$1:::mutex-release
/self->mutex_acq>1/
{
self->mutex_acq--;
self->tstp[arg0]=timestamp;
@mutex_released["No of mutexes released"]=count();
printf("%d mutex duration:%d\n", arg0,
self->tstp[arg0]-self->ts[arg0]);
}
------------------------------------------------------------------------------------------

You can see two release probes here, because I was trying to find out the
duration of outermost lock, when there is nesting.

And for the timing measurement in the c code, I think it must be the user level
gethrtime()..I include #include <sys/time.h> for using it.

Thanks,
Neelam


--
This message posted from opensolaris.org

Rayson Ho

2007-Nov-07 20:48 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Nov 7, 2007 3:35 PM, Neelam <neelam.goyal at gmail.com>
wrote:> You can see two release probes here, because I was trying to find out the
duration of outermost lock, when there is nesting.
>
> And for the timing measurement in the c code, I think it must be the user
level gethrtime()..I include #include <sys/time.h> for using it.
Right, so what I said yesterday was correct.

gethrtime() is a fast trap system call, so calling it has very little
overhead. But using DTrace to instrument your code is expensive. In
fact, going so deep from the userland all the way to the DTrace logic
is way more expensive than a getpid(2), which is already slower than a
fast trap system call.

Rayson

>
>
> Thanks,
> Neelam
>
>
> --
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Neelam

2007-Nov-07 20:53 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

so is there any alternative to find more accurate numbers using dtrace? I would
like to use dtrace, because my actual workload is much more complex and long, so
manually instrumenting for all possible locks is not feasible. I would still
like to use dtrace to trace all the locks!!

Thanks,
Neelam


--
This message posted from opensolaris.org

Roman Shaposhnik

2007-Nov-07 21:17 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Wed, 2007-11-07 at 15:48 -0500, Rayson Ho wrote:> On Nov 7, 2007 3:35 PM, Neelam <neelam.goyal at gmail.com> wrote:
> > You can see two release probes here, because I was trying to find out
the duration of outermost lock, when there is nesting.
> >
> > And for the timing measurement in the c code, I think it must be the
user level gethrtime()..I include #include <sys/time.h> for using it.
> 
> Right, so what I said yesterday was correct.
> 
> gethrtime() is a fast trap system call, so calling it has very little
> overhead. But using DTrace to instrument your code is expensive. In
> fact, going so deep from the userland all the way to the DTrace logic
> is way more expensive than a getpid(2), which is already slower than a
> fast trap system call.
  I have troubles correlating the two statements you''ve made. Is it
that
you''re suggesting NOT to use DTrace because it is slower than manually
inserted timing calls around each pthread_mutex_[un]lock ?

Thanks,
Roman.

Rayson Ho

2007-Nov-07 21:24 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Nov 7, 2007 4:17 PM, Roman Shaposhnik <rvs at sun.com>
wrote:>   I have troubles correlating the two statements you''ve made. Is
it that
> you''re suggesting NOT to use DTrace because it is slower than
manually
> inserted timing calls around each pthread_mutex_[un]lock ?
Well... Where did I say that??

I am helping Neelam to understand the problem. You sure don''t want to
use a stop watch to measure the latency of a cache miss, do you??

By the same token, if the overhead of getting to DTrace from the
userland is much greater than that of getting a lock, I am sure
picking DTrace is wrong...

BTW, this is not the first time I am seeing responses like this. Every
time I make the suggestion that DTrace is not the right tool for a
certain problem, someone from Sun would sure to defense. *Please*
understand that no tools can handle all the performance issues!!

Rayson

>
> Thanks,
> Roman.
>
>

Roman Shaposhnik

2007-Nov-07 23:22 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Wed, 2007-11-07 at 16:24 -0500, Rayson Ho wrote:> On Nov 7, 2007 4:17 PM, Roman Shaposhnik <rvs at sun.com> wrote:
> >   I have troubles correlating the two statements you''ve made.
Is it that
> > you''re suggesting NOT to use DTrace because it is slower than
manually
> > inserted timing calls around each pthread_mutex_[un]lock ?
> 
> Well... Where did I say that??
  I didn''t imply you did. I felt confused about your two statements
and offered you an interpretation of them that seemed to fit hoping
that it would be a useful first step to understanding what you 
actually had in mind.
> I am helping Neelam to understand the problem. You sure don''t want
to
> use a stop watch to measure the latency of a cache miss, do you??
  No, but I can used DTrace (well, almost -- the CPC provider is still
not part of the default setup) ;-)
> By the same token, if the overhead of getting to DTrace from the
> userland
  Do you have proof that the overhead is as great as you make it look?  
pthread_* operations are not exactly cheap. Even in userspace. Besides,
the lock duration time is usually a relative value anyway. I ran an
experiment once using rdtsc assembly instruction for manually timing
some mid-frequency user-space operations with and without DTrace
enabled. The difference was there (and I shouldn''t say it was
negligible) but it wasn''t all that impressive. I''d be curious
to hear
your side of the story.

Thanks,
Roman.

P.S. As for the tools applicable to the problem at hand one can always
use Sun Studio Performance Analyzer if small dilation is of a top
concern and all that''s needed is user-space application analysis.

Rayson Ho

2007-Nov-07 23:44 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Nov 7, 2007 6:22 PM, Roman Shaposhnik <rvs at sun.com>
wrote:>   I didn''t imply you did. I felt confused about your two
statements
> and offered you an interpretation of them that seemed to fit hoping
> that it would be a useful first step to understanding what you
> actually had in mind.
OK :)
>   Do you have proof that the overhead is as great as you make it look?
> pthread_* operations are not exactly cheap. Even in userspace. Besides,
> the lock duration time is usually a relative value anyway.
Agreed.

We don''t have Neelam''s code so we don''t know exactly
what it does. So
there''s too much guess work.
> I ran an
> experiment once using rdtsc assembly instruction for manually timing
> some mid-frequency user-space operations with and without DTrace
> enabled.
> The difference was there (and I shouldn''t say it was
> negligible) but it wasn''t all that impressive. I''d be
curious to hear
> your side of the story.
I believe the overhead of firing a probe from userspace to the DTrace
framework is in the order of thousands of cycles. But I don''t have
access to a Solaris system right now so I don''t know. Whether DTrace
is good enough or not all depends on how long the lock is held. I am
interested in your experiment as well...

Rayson

>
> Thanks,
> Roman.
>
> P.S. As for the tools applicable to the problem at hand one can always
> use Sun Studio Performance Analyzer if small dilation is of a top
> concern and all that''s needed is user-space application analysis.
>
>

Roman Shaposhnik

2007-Nov-07 23:53 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Wed, 2007-11-07 at 12:35 -0800, Neelam wrote:> You can see two release probes here, because I was trying to find out the
duration of outermost lock, when there is nesting.
> 
> And for the timing measurement in the c code, I think it must be the user
level gethrtime()..I include #include <sys/time.h> for using it.
  First of all -- your script looks reasonable, so the only possible
explanation I could come up with would be: there''s a hard to reproduce
bug in DTrace that makes timestamp disagree with gethrtime() on
some 64bit architectures (we observed it on a Solaris Nevada
running on a 64bit Acer laptop). The easiest way to verify
whether you''re affected by it or not is to run the following set
of commands:

  $ cat q.c
  #include <sys/time.h>
  #include <stdio.h>
  #include <inttypes.h>

  int main()
  {
   
     while (1) {
      printf("uspace -> %llx\n", gethrtime());
      sleep(1);
     }    
  }
  $ cc [-m64 depending on whether your actual app is 64bit] q.c
  $ dtrace -qn ''pid$pid:libc.so:gethrtime:entry { \
    printf("dtrace -> %llx\n", (uint64_t)timestamp); }'' -c
./a.out

Let it run for at least a minute and observe numbers. If the numbers you
get with dtrace -> and uspace -> prefixes seems to disagree
you''ve
got that problem.

Thanks,
Roman.

Roman Shaposhnik

2007-Nov-08 02:21 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Wed, 2007-11-07 at 18:44 -0500, Rayson Ho wrote:> > I ran an
> > experiment once using rdtsc assembly instruction for manually timing
> > some mid-frequency user-space operations with and without DTrace
> > enabled.
> > The difference was there (and I shouldn''t say it was
> > negligible) but it wasn''t all that impressive. I''d
be curious to hear
> > your side of the story.
> 
> I believe the overhead of firing a probe from userspace to the DTrace
> framework is in the order of thousands of cycles. But I don''t have
> access to a Solaris system right now so I don''t know. Whether
DTrace
> is good enough or not all depends on how long the lock is held. I am
> interested in your experiment as well...
 Thousands of cycles sounds like a reasonable assessment, but in my book
that doesn''t make DTrace application to lock monitoring much slower. 
pthread_mutex_lock would take about 10-30K cycles (depending on whether
it is contended or not) so we''re talking about ~5% overhead. Again,
I don''t consider it to be negligible, but at the same time it is 
far from being catastrophic and in now way rules out DTrace application.
IMHO.

Thanks,
Roman.

Roman Shaposhnik

2007-Nov-08 02:25 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Wed, 2007-11-07 at 12:35 -0800, Neelam wrote:> Well I am trying to record the duration of user level mutex locks. Below is
the simple script:
> 
> -----------------------------------------------------------------
> #!/usr/sbin/dtrace -s
> 
> dtrace:::BEGIN
> {
> self->begin=timestamp;
> self->mutex_acq=0;
> }
> 
> plockstat$1:::mutex-acquire
> {
> self->ts[arg0]=timestamp;
> self->mutex_acq++;
> @mutex_acquired["No of mutexes acquired"]=count();
  One extra bit of advice -- get rid of aggregations. They might
add to dilation quite significantly.

Thanks,
Roman.

Bryan Cantrill

2007-Nov-08 02:36 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Wed, Nov 07, 2007 at 06:25:38PM -0800, Roman Shaposhnik
wrote:> On Wed, 2007-11-07 at 12:35 -0800, Neelam wrote:
> > Well I am trying to record the duration of user level mutex locks.
Below is the simple script:
> > 
> > -----------------------------------------------------------------
> > #!/usr/sbin/dtrace -s
> > 
> > dtrace:::BEGIN
> > {
> > self->begin=timestamp;
> > self->mutex_acq=0;
> > }
> > 
> > plockstat$1:::mutex-acquire
> > {
> > self->ts[arg0]=timestamp;
> > self->mutex_acq++;
> > @mutex_acquired["No of mutexes acquired"]=count();
> 
>   One extra bit of advice -- get rid of aggregations. They might
> add to dilation quite significantly.
Certainly aggregating by a string is going to hurt; if you just change
that to "@mutex_acquired = count()", you should be happier.  And given
that you''re trying to tune your enabling, you might want to consult
both the chapter on the subject in the DTrace documentation, and the
slides on the topic in the Advanced DTrace presenation (specifically,
look for "Trick: Measuring DTrace").

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Sun Microsystems FishWorks.       http://blogs.sun.com/bmc

Rayson Ho

2007-Nov-08 02:52 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

Roman,

Since you are on the Sun Studio team, and since this thread is related
to threads/locks, may be it is a good place to ask here...

Intel has thread checker, which is similar to Sun''s thread analyzer.
However, is there a tool on Solaris that is similar to Intel''s Thread
Profiler??

Rayson

On Nov 7, 2007 9:21 PM, Roman Shaposhnik <rvs at sun.com>
wrote:>  Thousands of cycles sounds like a reasonable assessment, but in my book
> that doesn''t make DTrace application to lock monitoring much
slower.
> pthread_mutex_lock would take about 10-30K cycles (depending on whether
> it is contended or not) so we''re talking about ~5% overhead.
Again,
> I don''t consider it to be negligible, but at the same time it is
> far from being catastrophic and in now way rules out DTrace application.
> IMHO.
>
> Thanks,
> Roman.
>
>

Neelam

2007-Nov-08 04:24 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

Actually currently, I am running DTrace on a microbenchmark. This is just to
validate the numbers for more complex workloads. In the microbenchmark, I just
have a single mutex lock. Actually when I introduced some delay to make the
critical section bigger, the difference between gethrtime() measurement and
DTrace was significantly reduced.

And can someone explain more about aggregations. I have couple of them in my
bigger scripts and I use various strings to differentiate different numbers.

Thanks for the help,
Neelam


--
This message posted from opensolaris.org

Roman Shaposhnik

2007-Nov-09 01:02 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Wed, 2007-11-07 at 21:52 -0500, Rayson Ho wrote:> Roman,
> 
> Since you are on the Sun Studio team, and since this thread is related
> to threads/locks, may be it is a good place to ask here...
> 
> Intel has thread checker, which is similar to Sun''s thread
analyzer.
> However, is there a tool on Solaris that is similar to Intel''s
Thread
> Profiler??
  I''m not all that familiar with Intel''s Thread Profiler but
from the
exposure I got at various conference it seems to be the tool that
lets you detect and fix things like lock contention, resource
starvation, false cache sharing and such. If that''s what
you''re
interested in -- I have two answers. First of all, Sun Studio
Performance Analyzer can do some of it for you (you just have
to know how to use it ;-)). Plus we''re currently working on 
Project D-Light (based on DTrace technology) where some of the
model problems have to do with lock contention and resource
starvation. We''re just approaching them from a different (more
generalized) direction.

  And for the record: I actually disagree with Intel''s approach 
of creating a dedicated performance analysis tool for each domain.
These domains have enough similarities to be exploited in a single
framework (hint: in Intel''s world, would I need an MPI Profiler
in addition to Thread Profiler? I guess I would, but what
if I''m dealing with a mixed mode application that''s MPI
and PThreaded one at the same time?).


Thanks,
Roman.

Jonathan Adams

2007-Nov-09 17:54 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

On Nov 7, 2007 11:24 PM, Neelam <neelam.goyal at gmail.com> wrote:
> Actually currently, I am running DTrace on a microbenchmark. This is just
> to validate the numbers for more complex workloads. In the microbenchmark,
I
> just have a single mutex lock. Actually when I introduced some delay to
make
> the critical section bigger, the difference between gethrtime() measurement
> and DTrace was significantly reduced.
>
> And can someone explain more about aggregations. I have couple of them in
> my bigger scripts and I use various strings to differentiate different
> numbers.
>
> Thanks for the help,
> Neelam
>
> When you do something like:
>
>   @foo["this case here"] = aggfunc1(args2);
>   @foo["case two here"] = aggfunc2(args2);
>
> Dtrace has to do a substantial amount of work *each time these lines are
> run* to hash the string into the correct aggregation bucket.  A better
> approach is:
>
>When you do something like:

  @foo["this case here"] = aggfunc1(args2);
  @foo["case two here"] = aggfunc2(args2);

Dtrace has to do a substantial amount of work *each time these lines are
run* to hash the string into the correct aggregation bucket.  A better
approach is:


  @case1 = aggfunc1(args1);
  @case2 = aggfunc2(args2);

...
END {
   printa("Case one: %@d\n", @case1);
   printa("Case two: %@d\n", @case2);
}

Which removes all of the string processing.


Cheers,
- jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20071109/d0686686/attachment.html>

Neelam

2007-Nov-11 18:08 UTC

head link

[dtrace-discuss] Accuracy of timestamp()!!

This seems reasonable. Thanks..


--
This message posted from opensolaris.org

dtrace discuss - Nov 2007 - Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!

[dtrace-discuss] Accuracy of timestamp()!!