Russ Petruzzelli
2006-Nov-29 02:23 UTC
[dtrace-discuss] how to debug context switching and mutex contentions?
I''m looking for a suggestion on a good way to hunt down the source of high context switching and mutex contentions... Is dtrace the way to go now, or should I stick with something like lockstat? Russ This is a 5 second interval for mpstat: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 16 0 0 1115 1241 206 9095 912 2420 7393 0 12105 68 25 0 7 17 1 0 1054 1049 86 9290 921 2420 7355 0 12123 67 25 0 8 18 0 0 1004 984 6 9115 902 2551 7336 0 11843 66 25 0 9 19 0 0 926 1033 26 9405 927 2480 7413 0 12216 68 25 0 8 This is complicated by the fact that there are a number of PIDs of interest running on the server.... Extra credit question... Why would a particular process when run alone (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu usage appear? 17990 ? 0:00 webservd 17991 ? 0:48 webservd 17992 ? 13:17 webservd 18004 ? 0:00 webservd 18005 ? 0:49 webservd 18006 ? 609:23 webservd 18024 ? 0:00 watcher 18025 ? 0:13 enpd 18026 ? 1:56 csstored 18029 ? 0:28 csadmind 18030 ? 0:09 cshttpd 18031 ? 0:02 cshttpd 18049 ? 0:00 watcher 18050 ? 0:13 enpd 18051 ? 0:30 stored 18052 ? 1:19 imapd 18054 ? 2:41 mshttpd 18055 ? 0:01 imsched 18062 ? 0:01 tcp_smtp 18063 ? 0:01 tcp_smtp
Surya Prakki
2006-Nov-29 04:48 UTC
[dtrace-discuss] how to debug context switching and mutex contentions?
From solaris 10 onwards, lockstat is a DTrace a provider. You may want to use lockstat, to figure where the lock conention is coming from and that may even explain the high ''csw''s (If you see a high ''adaptive mutex block'' number, all of them cause csw). -surya Russ Petruzzelli wrote:>I''m looking for a suggestion on a good way to hunt down the source of >high context switching and mutex contentions... > >Is dtrace the way to go now, or should I stick with something like lockstat? > >Russ > > >This is a 5 second interval for mpstat: > >CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 16 0 0 1115 1241 206 9095 912 2420 7393 0 12105 68 25 0 7 > 17 1 0 1054 1049 86 9290 921 2420 7355 0 12123 67 25 0 8 > 18 0 0 1004 984 6 9115 902 2551 7336 0 11843 66 25 0 9 > 19 0 0 926 1033 26 9405 927 2480 7413 0 12216 68 25 0 8 > > >This is complicated by the fact that there are a number of PIDs of >interest running on the server.... > >Extra credit question... Why would a particular process when run alone >(like mshttpd) show heavy usage with mpstat, but under prstat, no cpu >usage appear? > > > 17990 ? 0:00 webservd > 17991 ? 0:48 webservd > 17992 ? 13:17 webservd > 18004 ? 0:00 webservd > 18005 ? 0:49 webservd > 18006 ? 609:23 webservd > 18024 ? 0:00 watcher > 18025 ? 0:13 enpd > 18026 ? 1:56 csstored > 18029 ? 0:28 csadmind > 18030 ? 0:09 cshttpd > 18031 ? 0:02 cshttpd > 18049 ? 0:00 watcher > 18050 ? 0:13 enpd > 18051 ? 0:30 stored > 18052 ? 1:19 imapd > 18054 ? 2:41 mshttpd > 18055 ? 0:01 imsched > 18062 ? 0:01 tcp_smtp > 18063 ? 0:01 tcp_smtp > > >_______________________________________________ >dtrace-discuss mailing list >dtrace-discuss at opensolaris.org > >
Roch - PAE
2006-Nov-29 09:16 UTC
[dtrace-discuss] how to debug context switching and mutex contentions?
Is the application being adversely affected here ? What business problem would we be chasing here ? -r Russ Petruzzelli writes: > I''m looking for a suggestion on a good way to hunt down the source of > high context switching and mutex contentions... > > Is dtrace the way to go now, or should I stick with something like lockstat? > > Russ > > > This is a 5 second interval for mpstat: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 16 0 0 1115 1241 206 9095 912 2420 7393 0 12105 68 25 0 7 > 17 1 0 1054 1049 86 9290 921 2420 7355 0 12123 67 25 0 8 > 18 0 0 1004 984 6 9115 902 2551 7336 0 11843 66 25 0 9 > 19 0 0 926 1033 26 9405 927 2480 7413 0 12216 68 25 0 8 > > > This is complicated by the fact that there are a number of PIDs of > interest running on the server.... > > Extra credit question... Why would a particular process when run alone > (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu > usage appear? > > > 17990 ? 0:00 webservd > 17991 ? 0:48 webservd > 17992 ? 13:17 webservd > 18004 ? 0:00 webservd > 18005 ? 0:49 webservd > 18006 ? 609:23 webservd > 18024 ? 0:00 watcher > 18025 ? 0:13 enpd > 18026 ? 1:56 csstored > 18029 ? 0:28 csadmind > 18030 ? 0:09 cshttpd > 18031 ? 0:02 cshttpd > 18049 ? 0:00 watcher > 18050 ? 0:13 enpd > 18051 ? 0:30 stored > 18052 ? 1:19 imapd > 18054 ? 2:41 mshttpd > 18055 ? 0:01 imsched > 18062 ? 0:01 tcp_smtp > 18063 ? 0:01 tcp_smtp > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Jim Mauro
2006-Nov-29 15:48 UTC
[dtrace-discuss] how to debug context switching and mutex contentions?
"high context switching" and "mutex contentions" are relative. Depending on the hardware platform and processors, the smtx and csw rates can appear high relative to what we saw in recent years on slower machines. Are you having a performance problem? If so, define it in performance terms. If you''d simply like to understand the behavior of the system better, than lockstat and dtrace can help (lockstat is a dtrace consumer in S10). context switches happen mostly because threads issue blocking system calls. You can use "dtrace -n ''syscall:::entry { @[probefunc]=count() }'' " to see what the are, and add "execname" as an aggregation key to see which processes are executing which system calls. You may also have preemptions happening, due to either time quantum expiration or higher priority threads becoming runnable. There''s a dtrace script in /usr/demo/dtrace, whopreempt.d, that is a good starting point for understanding if there''s a lot of preemption activity (which will also cause context switches). Use lockstat to determine which kernel mutex locks are active (note I did NOT say busy, or hot). If this is an academic exercise, start there. If it''s a real performance project, don''t waste your time until you''ve defined a performance metric, measured performance, and set a reasonable goal. HTH, /jim Russ Petruzzelli wrote:> I''m looking for a suggestion on a good way to hunt down the source of > high context switching and mutex contentions... > > Is dtrace the way to go now, or should I stick with something like lockstat? > > Russ > > > This is a 5 second interval for mpstat: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 16 0 0 1115 1241 206 9095 912 2420 7393 0 12105 68 25 0 7 > 17 1 0 1054 1049 86 9290 921 2420 7355 0 12123 67 25 0 8 > 18 0 0 1004 984 6 9115 902 2551 7336 0 11843 66 25 0 9 > 19 0 0 926 1033 26 9405 927 2480 7413 0 12216 68 25 0 8 > > > This is complicated by the fact that there are a number of PIDs of > interest running on the server.... > > Extra credit question... Why would a particular process when run alone > (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu > usage appear? > > > 17990 ? 0:00 webservd > 17991 ? 0:48 webservd > 17992 ? 13:17 webservd > 18004 ? 0:00 webservd > 18005 ? 0:49 webservd > 18006 ? 609:23 webservd > 18024 ? 0:00 watcher > 18025 ? 0:13 enpd > 18026 ? 1:56 csstored > 18029 ? 0:28 csadmind > 18030 ? 0:09 cshttpd > 18031 ? 0:02 cshttpd > 18049 ? 0:00 watcher > 18050 ? 0:13 enpd > 18051 ? 0:30 stored > 18052 ? 1:19 imapd > 18054 ? 2:41 mshttpd > 18055 ? 0:01 imsched > 18062 ? 0:01 tcp_smtp > 18063 ? 0:01 tcp_smtp > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Wee Yeh Tan
2006-Nov-29 16:05 UTC
[dtrace-discuss] how to debug context switching and mutex contentions?
On 11/29/06, Russ Petruzzelli <Russ.Petruzzelli at sun.com> wrote: ...> Extra credit question... Why would a particular process when run alone > (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu > usage appear?My hunch is that mshttpd is multi-threaded but you are watching processes with prstat. -- Just me, Wire ...
Chip Bennett
2006-Nov-29 17:50 UTC
[dtrace-discuss] how to debug context switching and mutex contentions?
Russ Petruzzelli wrote:> Extra credit question... Why would a particular process when run alone > (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu > usage appear? >I noticed you''re using a multi-processor system. mpstat shows processor data while prstat shows process data. So while mpstat might show that a process is fully utilizing one processor (100%), prstat would report that the process is only using (1/n)% of the whole system, where "n" is the number of processors, cores, or strands, depending on your architecture. Could this be the source of your confusion? What do we get if we get the correct answer on the extra credit? ;-) Chip
Jarod Jenson
2006-Nov-30 14:44 UTC
[dtrace-discuss] how to debug context switching and mutex contentions?
Russ Petruzzelli''s email at 11/28/2006 8:23 PM, said:> I''m looking for a suggestion on a good way to hunt down the source of > high context switching and mutex contentions... > > Is dtrace the way to go now, or should I stick with something like lockstat?Looking at the mpstat(1M) below, I would definitely start with lockstat. I would guess that this is a few highly contested locks. With that in mind, I would try something like: # lockstat -D 30 -w sleep 10 during load.> > Russ > > > This is a 5 second interval for mpstat: > > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 16 0 0 1115 1241 206 9095 912 2420 7393 0 12105 68 25 0 7 > 17 1 0 1054 1049 86 9290 921 2420 7355 0 12123 67 25 0 8 > 18 0 0 1004 984 6 9115 902 2551 7336 0 11843 66 25 0 9 > 19 0 0 926 1033 26 9405 927 2480 7413 0 12216 68 25 0 8As others have said, whether your smtx and csw''s are "high" or not; it is relative. However, given the mpstat(1M) as a whole and the workload you have specified below; I am at least a bit concerned. You could try something simple like: # dtrace -n off-cpu''{@[ustack(5)] = count()}END{trunc(@, 20)}'' just to get a basic idea of why you are voluntarily coming off cpu. I would back burner the icsw for now. Also, knowing those syscalls would be nice.> > > This is complicated by the fact that there are a number of PIDs of > interest running on the server....Just add "pid" as an aggregation key where necessary. This is why DTrace is a wonderful systemic observation tool.> > Extra credit question... Why would a particular process when run alone > (like mshttpd) show heavy usage with mpstat, but under prstat, no cpu > usage appear? > > > 17990 ? 0:00 webservd > 17991 ? 0:48 webservd > 17992 ? 13:17 webservd > 18004 ? 0:00 webservd > 18005 ? 0:49 webservd > 18006 ? 609:23 webservd > 18024 ? 0:00 watcher > 18025 ? 0:13 enpd > 18026 ? 1:56 csstored > 18029 ? 0:28 csadmind > 18030 ? 0:09 cshttpd > 18031 ? 0:02 cshttpd > 18049 ? 0:00 watcher > 18050 ? 0:13 enpd > 18051 ? 0:30 stored > 18052 ? 1:19 imapd > 18054 ? 2:41 mshttpd > 18055 ? 0:01 imsched > 18062 ? 0:01 tcp_smtp > 18063 ? 0:01 tcp_smtp1) It isn''t mshttpd that is consuming cycles when you take your mpstat(1M) or prstat(1M) snapshots. 2) You are not running ''prstat -Lmp `pgrep mshttpd`'' when you use prstat. When diagnosing performance problems, per thread and microstate accounting are the only way to run prstat (in my opinion). Thanks, Jarod
Brendan Gregg - Sun Microsystems
2006-Nov-30 18:17 UTC
[dtrace-discuss] how to debug context switching and mutex contentions?
G''Day Folks, On Wed, Nov 29, 2006 at 11:50:04AM -0600, Chip Bennett wrote:> Russ Petruzzelli wrote: > >Extra credit question... Why would a particular process when run alone > >(like mshttpd) show heavy usage with mpstat, but under prstat, no cpu > >usage appear? > > > I noticed you''re using a multi-processor system. mpstat shows processor > data while prstat shows process data. So while mpstat might show that a > process is fully utilizing one processor (100%), prstat would report > that the process is only using (1/n)% of the whole system, where "n" is > the number of processors, cores, or strands, depending on your > architecture. Could this be the source of your confusion?What is additionally confusing is the time ranges that these utilisation values are calculated for; mpstat will use the interval specified, whereas prstat''s %CPU is reading pr_pctcpu - a decaying average of "recent" activity. prstat -m (as Jarod said) may be better, as it will provide standard interval/averages of CPU usage. It may also be short lived processes that prstat doesn''t snapshot (use execsnoop from the DTraceToolkit or one of the one liners to check). Either mshttpd terminating and restarting, or children of mshttpd. Does mshttpd call CGI scripts? I''ve seen servers with hundreds of short lived Perl scripts per second, where the developers hadn''t used modperl. cheers, Brendan -- Brendan [CA, USA]