thr3ads.net - dtrace discuss - [dtrace-discuss] plockstat/prstat -m inconsistency? [Feb 2006]

If this information is useful, please help other people find it:
Share via:

Chad Mynhier

2006-Feb-10 17:30 UTC

[dtrace-discuss] plockstat/prstat -m inconsistency?

I''m trying to debug a performance problem, and I''m seeing what
appears
to be a weird inconsistency between prstat -m output (or, more
specifically, prstat -Lmp) and plockstat.

(Note:  I''m the sysadmin trying to meaure the performance of a T2000. 
I''m not one of the Java developers who wrote the application
I''m
running on the server.)

When I run prstat -Lmp on the java process with no traffic being sent
to the server, I see a number of processes sitting at 100% in LCK. 
This isn''t necessarily unexpected, as prstat -Lm occasionally shows
threads from other processes sitting at 100% in LCK (nscd, inetd,
svc.configd, svc.startd.)

However, when I run a dtrace script to measure the total amout of time
that process is spending waiting on user locks (derived from plockstat
-V -A output), that total is around 0.1% of the wallclock time for the
sample.  For example:

Time spent blocked:        27361964
Total time:                 49317563716

I''ve attached the script below.

Does anyone have a good explanation for why I''m seeing this
discrepancy?  The most obvious explanation is that the dtrace script
isn''t matching the full set of events being accounted for by the
system and reported as LCK by prstat -m.

Thanks,
Chad Mynhier



#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
    starttime=timestamp;
    blocktime = 0;
}

plockstat$1:::rw-acquire
{
        self->rwhold[arg0] = timestamp;
}
plockstat$1:::mutex-acquire
/arg1 == 0/
{
        mtxhold[arg0] = timestamp;
}
plockstat$1:::rw-release
/self->rwhold[arg0] && arg1 == 1/
{
        @rw_w_hold[arg0, ustack(5)] = sum(timestamp - self->rwhold[arg0]);
        self->rwhold[arg0] = 0;
}
plockstat$1:::rw-release
/self->rwhold[arg0]/
{
        @rw_r_hold[arg0, ustack(5)] = sum(timestamp - self->rwhold[arg0]);
        self->rwhold[arg0] = 0;
}
plockstat$1:::mutex-release
/mtxhold[arg0] && arg1 == 0/
{
        @mtx_hold[arg0, ustack(5)] = sum(timestamp - mtxhold[arg0]);
        mtxhold[arg0] = 0;
}
plockstat$1:::rw-block
{
        self->rwblock[arg0] = timestamp;
}
plockstat$1:::mutex-block
{
        self->mtxblock[arg0] = timestamp;
}
plockstat$1:::mutex-spin
{
        self->mtxspin[arg0] = timestamp;
}
plockstat$1:::rw-acquire
/self->rwblock[arg0] && arg1 == 1/
{
        foo = timestamp - self->rwblock[arg0];
        @rw_w_block[arg0, ustack(5)] = sum(foo);
        self->rwblock[arg0] = 0;
        blocktime += foo;
}
plockstat$1:::rw-acquire
/self->rwblock[arg0]/
{
        foo = timestamp - self->rwblock[arg0];
        @rw_r_block[arg0, ustack(5)] = sum(foo);
        self->rwblock[arg0] = 0;
        blocktime += foo;
}
plockstat$1:::mutex-block
/self->mtxspin[arg0]/
{
        foo = timestamp - self->mtxspin[arg0];
        @mtx_vain_spin[arg0, ustack(5)] = sum(foo);
        self->mtxspin[arg0] = 0;
        blocktime += foo;
}
plockstat$1:::mutex-acquire
/self->mtxspin[arg0]/
{
        foo = timestamp - self->mtxspin[arg0];
        @mtx_spin[arg0, ustack(5)] = sum(foo);
        self->mtxspin[arg0] = 0;
        blocktime += foo;
}
plockstat$1:::mutex-acquire
/self->mtxblock[arg0]/
{
        foo = timestamp - self->mtxblock[arg0];
        @mtx_block[arg0, ustack(5)] = sum(foo);
        self->mtxblock[arg0] = 0;
        blocktime += foo;
}
plockstat$1:::mutex-error
/self->mtxblock[arg0]/
{
        self->mtxblock[arg0] = 0;
}
plockstat$1:::rw-error
/self->rwblock[arg0]/
{
        self->rwblock[arg0] = 0;
}
plockstat$1:::mutex-error
/self->mtxspin[arg0]/
{
        self->mtxspin[arg0] = 0;
}

END
{
    totaltime = timestamp - starttime;

    printf("Time spent blocked on mutexes: %d\n",blocktime);
    printf("Total time: %d\n",totaltime);
}

Jarod Jenson

2006-Feb-10 17:48 UTC

head link

[dtrace-discuss] plockstat/prstat -m inconsistency?

Chad Mynhier''s email at 2/10/2006 11:30 AM,
said:> I''m trying to debug a performance problem, and I''m seeing
what appears
> to be a weird inconsistency between prstat -m output (or, more
> specifically, prstat -Lmp) and plockstat.
> 
> (Note:  I''m the sysadmin trying to meaure the performance of a
T2000.
> I''m not one of the Java developers who wrote the application
I''m
> running on the server.)
> 
> When I run prstat -Lmp on the java process with no traffic being sent
> to the server, I see a number of processes sitting at 100% in LCK. 
> This isn''t necessarily unexpected, as prstat -Lm occasionally
shows
> threads from other processes sitting at 100% in LCK (nscd, inetd,
> svc.configd, svc.startd.)
> 
> However, when I run a dtrace script to measure the total amout of time
> that process is spending waiting on user locks (derived from plockstat
> -V -A output), that total is around 0.1% of the wallclock time for the
> sample.  For example:
> 
> Time spent blocked:        27361964
> Total time:                 49317563716
> 
> I''ve attached the script below.
> 
> Does anyone have a good explanation for why I''m seeing this
> discrepancy?  The most obvious explanation is that the dtrace script
> isn''t matching the full set of events being accounted for by the
> system and reported as LCK by prstat -m.
> 
>
Most likely they are sitting in [pthread_]cond_wait(3C). I personally 
think this should be reflected as SLP time (in fact, I believe I filed a 
bug a couple of years back), but apparently, the code disagrees:

http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/syscall/lwp_sobj.c#1552

You could use pstack(1) to verify this is the case. Using your nscd 
example, I show thread 24 in 100% LCK:

# pstack 549/24
549:    /usr/sbin/nscd
-----------------  lwp# 24 / thread# 24  --------------------
  fedb7fab lwp_park (0, fd65df68, 0)
  fedb22d9 cond_wait_queue (fee7f618, fee7f638, fd65df68, 0) + 3e
  fedb2680 cond_wait_common (fee7f618, fee7f638, fd65df68) + 1e9
  fedb290b _cond_timedwait (fee7f618, fee7f638, fd65dfa8) + a6
  fee56710 umem_update_thread (0) + 171
  fedb7d20 _thr_setup (feab5400) + 51
  fedb7f70 _lwp_start (feab5400, 0, 0, 0, 0, 0)

Thanks,

Jarod

Chad Mynhier

2006-Feb-10 18:10 UTC

head link

[dtrace-discuss] plockstat/prstat -m inconsistency?

On 2/10/06, Jarod Jenson <jarod at aeysis.com>
wrote:>
>
> Chad Mynhier''s email at 2/10/2006 11:30 AM, said:
> > I''m trying to debug a performance problem, and I''m
seeing what appears
> > to be a weird inconsistency between prstat -m output (or, more
> > specifically, prstat -Lmp) and plockstat.
> >
>
> Most likely they are sitting in [pthread_]cond_wait(3C).
Yes, this appears to be the case here, too.

Thanks for the reply (and my apologies for not also replying to the
list the first time),
Chad Mynhier

Andrei Dorofeev

2006-Feb-10 18:34 UTC

head link

[dtrace-discuss] plockstat/prstat -m inconsistency?

> Most likely they are sitting in [pthread_]cond_wait(3C). I personally
> think this should be reflected as SLP time (in fact, I believe I filed a
> bug a couple of years back).
I also think that the current choice of a microstate to report time spent
sleeping on userland condition variables leads to confusion.  A quick
bugtraq search found at least three bugs filed against this:

 6308383 Thread state when sleeping on COND_VAR
 6312989 condvar time is reported as waiting on user lock instead of sleeping
 6183800 sleeping on CVs is "LCK" time, should be "SLP"

Looks like Jonathan Adams was looking at this problem recently
(when 6183800 was filed), so maybe he can share some results
of his investigation with us.

- Andrei

Jonathan Adams

2006-Feb-10 19:08 UTC

head link

[dtrace-discuss] plockstat/prstat -m inconsistency?

On Fri, Feb 10, 2006 at 10:34:56AM -0800, Andrei Dorofeev
wrote:> > Most likely they are sitting in [pthread_]cond_wait(3C). I personally
> > think this should be reflected as SLP time (in fact, I believe I filed
a
> > bug a couple of years back).
> 
> I also think that the current choice of a microstate to report time spent
> sleeping on userland condition variables leads to confusion.  A quick
> bugtraq search found at least three bugs filed against this:
> 
>  6308383 Thread state when sleeping on COND_VAR
>  6312989 condvar time is reported as waiting on user lock instead of
sleeping
>  6183800 sleeping on CVs is "LCK" time, should be "SLP"
> 
> Looks like Jonathan Adams was looking at this problem recently
> (when 6183800 was filed), so maybe he can share some results
> of his investigation with us.
The big problem is that we do "wait morphing";  that is:

	T1				| T2
					| 
	pthread_cond_wait(&cv, &mx)     | 
1)	    puts itself on a userland   | 
	    queue, then "parks" in      | 
	    the kernel                  | 
                                        | 
                                        | 
					| pthread_mutex_lock(&mx)
                                        | 
					| ... does work ...
                                        | 
2)					| pthread_cond_signal(&cv);
                                        |    moves T1 from the CV queue to
                                        |    the MX queue
                                        | 
                                        | 
                                        | ... does more work ...
                                        | 
3)                                      | pthread_mutex_unlock(&mx);
                                        |    wakes up T1
                                        | 
                                        | 

Right now, all of the time from (1) to (3) is recorded as
''LCK'' time.   The
simplest change is to record all of the time from (1) to (3) as
''SLP'' time.
The problem with that is that you would expect the time from (2) to (3) to
count as ''LCK'' time, since it is time spent waiting for a
mutex.
So if the mutex is held for a long time by T2, or the mutex is heavily
contended, you get bogus answers.

But getting the time from (2) to (3) recorded differently than the time
from (1) to (2) is not trivial;  T1 gets *no* notification of the change
made at (2).

That''s as far as I made it in my investigation.

I''ve closed 6308383 and 6312989 as dups of 6183800, and put the above
information into 6183800.

Cheers,
- jonathan

--
Jonathan Adams, Solaris Kernel Development

dtrace discuss - Feb 2006 - plockstat/prstat -m inconsistency?

[dtrace-discuss] plockstat/prstat -m inconsistency?

[dtrace-discuss] plockstat/prstat -m inconsistency?

[dtrace-discuss] plockstat/prstat -m inconsistency?

[dtrace-discuss] plockstat/prstat -m inconsistency?

[dtrace-discuss] plockstat/prstat -m inconsistency?