thr3ads.net - dtrace discuss - [dtrace-discuss] how to debug context switching and mutex contentions? [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Russ Petruzzelli

2006-Nov-29 02:23 UTC

[dtrace-discuss] how to debug context switching and mutex contentions?

I''m looking for a suggestion on a good way to hunt down the source of
high context switching and mutex contentions...

Is dtrace the way to go now, or should I stick with something like lockstat?

Russ


This is a 5 second interval for mpstat:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 16    0   0 1115  1241  206 9095  912 2420 7393    0 12105   68  25   0   7
 17    1   0 1054  1049   86 9290  921 2420 7355    0 12123   67  25   0   8
 18    0   0 1004   984    6 9115  902 2551 7336    0 11843   66  25   0   9
 19    0   0  926  1033   26 9405  927 2480 7413    0 12216   68  25   0   8


This is complicated by the fact that there are a number of PIDs of
interest running on the server....

Extra credit question... Why would a particular process when run alone
(like mshttpd) show heavy usage with mpstat, but under prstat, no cpu
usage appear?


 17990 ?           0:00 webservd
 17991 ?           0:48 webservd
 17992 ?          13:17 webservd
 18004 ?           0:00 webservd
 18005 ?           0:49 webservd
 18006 ?         609:23 webservd
 18024 ?           0:00 watcher
 18025 ?           0:13 enpd
 18026 ?           1:56 csstored
 18029 ?           0:28 csadmind
 18030 ?           0:09 cshttpd
 18031 ?           0:02 cshttpd
 18049 ?           0:00 watcher
 18050 ?           0:13 enpd
 18051 ?           0:30 stored
 18052 ?           1:19 imapd
 18054 ?           2:41 mshttpd
 18055 ?           0:01 imsched
 18062 ?           0:01 tcp_smtp
 18063 ?           0:01 tcp_smtp

Surya Prakki

2006-Nov-29 04:48 UTC

head link

[dtrace-discuss] how to debug context switching and mutex contentions?

From solaris 10 onwards, lockstat is a DTrace a provider.
You may want to use lockstat, to figure where the lock conention is coming
from and that may even explain the high ''csw''s (If you see a
high
''adaptive mutex
block'' number, all of them cause csw).

-surya

Russ Petruzzelli wrote:
>I''m looking for a suggestion on a good way to hunt down the source
of
>high context switching and mutex contentions...
>
>Is dtrace the way to go now, or should I stick with something like lockstat?
>
>Russ
>
>
>This is a 5 second interval for mpstat:
>
>CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
> 16    0   0 1115  1241  206 9095  912 2420 7393    0 12105   68  25   0   7
> 17    1   0 1054  1049   86 9290  921 2420 7355    0 12123   67  25   0   8
> 18    0   0 1004   984    6 9115  902 2551 7336    0 11843   66  25   0   9
> 19    0   0  926  1033   26 9405  927 2480 7413    0 12216   68  25   0   8
>
>
>This is complicated by the fact that there are a number of PIDs of
>interest running on the server....
>
>Extra credit question... Why would a particular process when run alone
>(like mshttpd) show heavy usage with mpstat, but under prstat, no cpu
>usage appear?
>
>
> 17990 ?           0:00 webservd
> 17991 ?           0:48 webservd
> 17992 ?          13:17 webservd
> 18004 ?           0:00 webservd
> 18005 ?           0:49 webservd
> 18006 ?         609:23 webservd
> 18024 ?           0:00 watcher
> 18025 ?           0:13 enpd
> 18026 ?           1:56 csstored
> 18029 ?           0:28 csadmind
> 18030 ?           0:09 cshttpd
> 18031 ?           0:02 cshttpd
> 18049 ?           0:00 watcher
> 18050 ?           0:13 enpd
> 18051 ?           0:30 stored
> 18052 ?           1:19 imapd
> 18054 ?           2:41 mshttpd
> 18055 ?           0:01 imsched
> 18062 ?           0:01 tcp_smtp
> 18063 ?           0:01 tcp_smtp
>
>
>_______________________________________________
>dtrace-discuss mailing list
>dtrace-discuss at opensolaris.org
>  
>

Roch - PAE

2006-Nov-29 09:16 UTC

head link

[dtrace-discuss] how to debug context switching and mutex contentions?

Is the application being adversely affected here ?
What business problem would we be chasing here ?

-r

Russ Petruzzelli writes:
 > I''m looking for a suggestion on a good way to hunt down the
source of
 > high context switching and mutex contentions...
 > 
 > Is dtrace the way to go now, or should I stick with something like
lockstat?
 > 
 > Russ
 > 
 > 
 > This is a 5 second interval for mpstat:
 > 
 > CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
 >  16    0   0 1115  1241  206 9095  912 2420 7393    0 12105   68  25   0  
7
 >  17    1   0 1054  1049   86 9290  921 2420 7355    0 12123   67  25   0  
8
 >  18    0   0 1004   984    6 9115  902 2551 7336    0 11843   66  25   0  
9
 >  19    0   0  926  1033   26 9405  927 2480 7413    0 12216   68  25   0  
8
 > 
 > 
 > This is complicated by the fact that there are a number of PIDs of
 > interest running on the server....
 > 
 > Extra credit question... Why would a particular process when run alone
 > (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu
 > usage appear?
 > 
 > 
 >  17990 ?           0:00 webservd
 >  17991 ?           0:48 webservd
 >  17992 ?          13:17 webservd
 >  18004 ?           0:00 webservd
 >  18005 ?           0:49 webservd
 >  18006 ?         609:23 webservd
 >  18024 ?           0:00 watcher
 >  18025 ?           0:13 enpd
 >  18026 ?           1:56 csstored
 >  18029 ?           0:28 csadmind
 >  18030 ?           0:09 cshttpd
 >  18031 ?           0:02 cshttpd
 >  18049 ?           0:00 watcher
 >  18050 ?           0:13 enpd
 >  18051 ?           0:30 stored
 >  18052 ?           1:19 imapd
 >  18054 ?           2:41 mshttpd
 >  18055 ?           0:01 imsched
 >  18062 ?           0:01 tcp_smtp
 >  18063 ?           0:01 tcp_smtp
 > 
 > 
 > _______________________________________________
 > dtrace-discuss mailing list
 > dtrace-discuss at opensolaris.org

Jim Mauro

2006-Nov-29 15:48 UTC

head link

[dtrace-discuss] how to debug context switching and mutex contentions?

"high context switching" and "mutex contentions" are
relative.
Depending on the hardware platform and processors, the smtx
and csw rates can appear high relative to what we saw in recent
years on slower machines.

Are you having a performance problem? If so, define it in performance
terms.

If you''d simply like to understand the behavior of the system better,
than lockstat and dtrace can help (lockstat is a dtrace consumer in
S10).

context switches happen mostly because threads issue blocking system
calls. You can use "dtrace -n ''syscall:::entry {
@[probefunc]=count() }'' "
to see what the are, and add "execname" as an aggregation key to see
which
processes are executing which system calls.

You may also have preemptions happening, due to either time quantum
expiration or higher priority threads becoming runnable. There''s a
dtrace script in /usr/demo/dtrace, whopreempt.d, that is a good starting
point for understanding if there''s a lot of preemption activity (which
will
also cause context switches).

Use lockstat to determine which kernel mutex locks are active (note I
did NOT say busy, or hot).

If this is an academic exercise, start there. If it''s a real
performance
project,
don''t waste your time until you''ve defined a performance
metric, measured
performance, and set a reasonable goal.

HTH,
/jim

Russ Petruzzelli wrote:> I''m looking for a suggestion on a good way to hunt down the source
of
> high context switching and mutex contentions...
>
> Is dtrace the way to go now, or should I stick with something like
lockstat?
>
> Russ
>
>
> This is a 5 second interval for mpstat:
>
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>  16    0   0 1115  1241  206 9095  912 2420 7393    0 12105   68  25   0  
7
>  17    1   0 1054  1049   86 9290  921 2420 7355    0 12123   67  25   0  
8
>  18    0   0 1004   984    6 9115  902 2551 7336    0 11843   66  25   0  
9
>  19    0   0  926  1033   26 9405  927 2480 7413    0 12216   68  25   0  
8
>
>
> This is complicated by the fact that there are a number of PIDs of
> interest running on the server....
>
> Extra credit question... Why would a particular process when run alone
> (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu
> usage appear?
>
>
>  17990 ?           0:00 webservd
>  17991 ?           0:48 webservd
>  17992 ?          13:17 webservd
>  18004 ?           0:00 webservd
>  18005 ?           0:49 webservd
>  18006 ?         609:23 webservd
>  18024 ?           0:00 watcher
>  18025 ?           0:13 enpd
>  18026 ?           1:56 csstored
>  18029 ?           0:28 csadmind
>  18030 ?           0:09 cshttpd
>  18031 ?           0:02 cshttpd
>  18049 ?           0:00 watcher
>  18050 ?           0:13 enpd
>  18051 ?           0:30 stored
>  18052 ?           1:19 imapd
>  18054 ?           2:41 mshttpd
>  18055 ?           0:01 imsched
>  18062 ?           0:01 tcp_smtp
>  18063 ?           0:01 tcp_smtp
>
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Wee Yeh Tan

2006-Nov-29 16:05 UTC

head link

[dtrace-discuss] how to debug context switching and mutex contentions?

On 11/29/06, Russ Petruzzelli <Russ.Petruzzelli at sun.com> wrote:
...> Extra credit question... Why would a particular process when run alone
> (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu
> usage appear?
My hunch is that mshttpd is multi-threaded but you are watching
processes with prstat.

-- 
Just me,
Wire ...

Chip Bennett

2006-Nov-29 17:50 UTC

head link

[dtrace-discuss] how to debug context switching and mutex contentions?

Russ Petruzzelli wrote:> Extra credit question... Why would a particular process when run alone
> (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu
> usage appear?
>   I noticed you''re using a multi-processor system.  mpstat shows
processor
data while prstat shows process data.  So while mpstat might show that a 
process is fully utilizing one processor (100%), prstat would report 
that the process is only using (1/n)% of the whole system, where "n"
is
the number of processors, cores, or strands, depending on your 
architecture.  Could this be the source of your confusion?

What do we get if we get the correct answer on the extra credit?  ;-)

Chip

Jarod Jenson

2006-Nov-30 14:44 UTC

head link

[dtrace-discuss] how to debug context switching and mutex contentions?

Russ Petruzzelli''s email at 11/28/2006 8:23 PM,
said:> I''m looking for a suggestion on a good way to hunt down the source
of
> high context switching and mutex contentions...
> 
> Is dtrace the way to go now, or should I stick with something like
lockstat?
Looking at the mpstat(1M) below, I would definitely start with lockstat. 
I would guess that this is a few highly contested locks. With that in 
mind, I would try something like:

# lockstat -D 30 -w sleep 10

during load.
> 
> Russ
> 
> 
> This is a 5 second interval for mpstat:
> 
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>  16    0   0 1115  1241  206 9095  912 2420 7393    0 12105   68  25   0  
7
>  17    1   0 1054  1049   86 9290  921 2420 7355    0 12123   67  25   0  
8
>  18    0   0 1004   984    6 9115  902 2551 7336    0 11843   66  25   0  
9
>  19    0   0  926  1033   26 9405  927 2480 7413    0 12216   68  25   0  
8
As others have said, whether your smtx and csw''s are "high"
or not; it
is relative. However, given the mpstat(1M) as a whole and the workload 
you have specified below; I am at least a bit concerned.

You could try something simple like:

# dtrace -n off-cpu''{@[ustack(5)] = count()}END{trunc(@, 20)}''

just to get a basic idea of why you are voluntarily coming off cpu. I 
would back burner the icsw for now.

Also, knowing those syscalls would be nice.
> 
> 
> This is complicated by the fact that there are a number of PIDs of
> interest running on the server....
Just add "pid" as an aggregation key where necessary. This is why
DTrace
is a wonderful systemic observation tool.
> 
> Extra credit question... Why would a particular process when run alone
> (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu
> usage appear?
> 
> 
>  17990 ?           0:00 webservd
>  17991 ?           0:48 webservd
>  17992 ?          13:17 webservd
>  18004 ?           0:00 webservd
>  18005 ?           0:49 webservd
>  18006 ?         609:23 webservd
>  18024 ?           0:00 watcher
>  18025 ?           0:13 enpd
>  18026 ?           1:56 csstored
>  18029 ?           0:28 csadmind
>  18030 ?           0:09 cshttpd
>  18031 ?           0:02 cshttpd
>  18049 ?           0:00 watcher
>  18050 ?           0:13 enpd
>  18051 ?           0:30 stored
>  18052 ?           1:19 imapd
>  18054 ?           2:41 mshttpd
>  18055 ?           0:01 imsched
>  18062 ?           0:01 tcp_smtp
>  18063 ?           0:01 tcp_smtp
1) It isn''t mshttpd that is consuming cycles when you take your 
mpstat(1M) or prstat(1M) snapshots.
2) You are not running ''prstat -Lmp `pgrep mshttpd`'' when you
use
prstat. When diagnosing performance problems, per thread and microstate 
accounting are the only way to run prstat (in my opinion).

Thanks,

Jarod

Brendan Gregg - Sun Microsystems

2006-Nov-30 18:17 UTC

head link

[dtrace-discuss] how to debug context switching and mutex contentions?

G''Day Folks,

On Wed, Nov 29, 2006 at 11:50:04AM -0600, Chip Bennett
wrote:> Russ Petruzzelli wrote:
> >Extra credit question... Why would a particular process when run alone
> >(like mshttpd) show heavy usage with mpstat, but under prstat, no cpu
> >usage appear?
> >  
> I noticed you''re using a multi-processor system.  mpstat shows
processor
> data while prstat shows process data.  So while mpstat might show that a 
> process is fully utilizing one processor (100%), prstat would report 
> that the process is only using (1/n)% of the whole system, where
"n" is
> the number of processors, cores, or strands, depending on your 
> architecture.  Could this be the source of your confusion?
What is additionally confusing is the time ranges that these utilisation
values are calculated for; mpstat will use the interval specified, whereas
prstat''s %CPU is reading pr_pctcpu - a decaying average of
"recent"
activity. prstat -m (as Jarod said) may be better, as it will provide
standard interval/averages of CPU usage.

It may also be short lived processes that prstat doesn''t snapshot (use
execsnoop from the DTraceToolkit or one of the one liners to check).
Either mshttpd terminating and restarting, or children of mshttpd.
Does mshttpd call CGI scripts? I''ve seen servers with hundreds of short
lived Perl scripts per second, where the developers hadn''t used
modperl.

cheers,

Brendan

-- 
Brendan
[CA, USA]

Possibly Parallel Threads

Search for more reasonably related threads

dtrace discuss - Nov 2006 - how to debug context switching and mutex contentions?

[dtrace-discuss] how to debug context switching and mutex contentions?

[dtrace-discuss] how to debug context switching and mutex contentions?

[dtrace-discuss] how to debug context switching and mutex contentions?

[dtrace-discuss] how to debug context switching and mutex contentions?

[dtrace-discuss] how to debug context switching and mutex contentions?

[dtrace-discuss] how to debug context switching and mutex contentions?

[dtrace-discuss] how to debug context switching and mutex contentions?

[dtrace-discuss] how to debug context switching and mutex contentions?

Possibly Parallel Threads